*** NCDR has joined the channel | 2019-05-06 07:58:56 |
*** NCDR has left the channel | 2019-05-06 08:10:29 |
<FunkyBob> | o/ | 2019-05-06 12:54:24 |
<Shelwien> | ? | 2019-05-06 12:56:53 |
<FunkyBob> | waving hello | 2019-05-06 13:02:16 |
| just got to day1 of the PyCon sprints | 2019-05-06 13:02:33 |
| what timezone are you in, Shelwien ? | 2019-05-06 13:02:41 |
*** maniscalco_ has left the channel | 2019-05-06 13:08:29 |
<Shelwien> | utc+2 | 2019-05-06 13:14:42 |
| just tested lztp with enwik8 | 2019-05-06 13:15:02 |
| 32k window: 40,674,412 | 2019-05-06 13:15:25 |
| 64k window: 40,570,224 | 2019-05-06 13:15:47 |
| greedy parsing | 2019-05-06 13:15:54 |
<FunkyBob> | lztp? | 2019-05-06 13:16:00 |
| what's the algorithms? | 2019-05-06 13:16:27 |
<Shelwien> | a mostly-bytewise lz scheme from tplink router | 2019-05-06 13:16:31 |
<FunkyBob> | ah | 2019-05-06 13:16:35 |
| that's quite impressive | 2019-05-06 13:16:39 |
<Shelwien> | i reverse-engineered it some time ago | 2019-05-06 13:16:47 |
| its kinda like lzss | 2019-05-06 13:16:57 |
| but with an interesting quirk | 2019-05-06 13:17:09 |
<FunkyBob> | ulz can get about that, in ~1min on my machine | 2019-05-06 13:17:16 |
<Shelwien> | d16 encoding takes 6s on enwik8 | 2019-05-06 13:17:57 |
<FunkyBob> | d16? | 2019-05-06 13:18:36 |
<Shelwien> | 1<<16 window | 2019-05-06 13:18:53 |
<FunkyBob> | ah | 2019-05-06 13:18:57 |
<Shelwien> | aka 64k | 2019-05-06 13:18:58 |
<FunkyBob> | for reference, how long does gzip -9 take? | 2019-05-06 13:19:13 |
<Shelwien> | 5.4s | 2019-05-06 13:19:40 |
| 4.1s for lztp with d15 | 2019-05-06 13:20:43 |
<FunkyBob> | hrm | 2019-05-06 13:21:21 |
<Shelwien> | anyway, what i wanted to say | 2019-05-06 13:21:27 |
<FunkyBob> | gzip -9 enwi8 for me takes ~29sec | 2019-05-06 13:21:34 |
<Shelwien> | dunno, its not at it best atm | 2019-05-06 13:21:50 |
<FunkyBob> | mmm? | 2019-05-06 13:21:50 |
<Shelwien> | i have 4 threads of optimizer running | 2019-05-06 13:22:00 |
| ...is that lzss just uses packed bits for literal/match flags | 2019-05-06 13:22:28 |
| while lztp developer went a step further | 2019-05-06 13:22:47 |
<FunkyBob> | hmm... must have had my laptop unplugged when I ran that.. it's only ~17sec now | 2019-05-06 13:23:08 |
| Shelwien: mmm? | 2019-05-06 13:23:16 |
<Shelwien> | and turned it into an actual interleaved bitstream | 2019-05-06 13:23:16 |
| in lzss its just a flag byte | 2019-05-06 13:23:34 |
<FunkyBob> | I'm toying with the idea of a larger history window, perhaps 20 bits | 2019-05-06 13:23:41 |
| ah, so a byte indicating the next 8 terms? | 2019-05-06 13:24:02 |
<Shelwien> | but in lztp they also put bits of variable-length len/dist there | 2019-05-06 13:24:10 |
<FunkyBob> | I was thinking I could use the first nibble for (match, size of len, 2bit size-of-offset) | 2019-05-06 13:24:36 |
<Shelwien> | and then there's a need to rearrange bytes from bit stream and byte stream | 2019-05-06 13:24:45 |
<FunkyBob> | where offset would be 4bit, 12bit, 20bit, or rep | 2019-05-06 13:24:46 |
<Shelwien> | in order of how decoder would read them | 2019-05-06 13:24:52 |
<FunkyBob> | ah, yes, I recall reading something of this | 2019-05-06 13:25:08 |
<Shelwien> | but otherwise i think this scheme has potential | 2019-05-06 13:25:14 |
<FunkyBob> | clearly | 2019-05-06 13:27:12 |
<Shelwien> | btw, i was talking about this: https://encode.ru/threads/3005-TP-LINK-router-config-compression | 2019-05-06 13:30:02 |
<FunkyBob> | yeah, I recall the thread | 2019-05-06 13:33:59 |
| am building up to trying a more "optimal" parsing... forward scan, reverse walk thing | 2019-05-06 13:58:22 |
| clearly it'll be slower, as it will require a match scan at _every_ byte | 2019-05-06 13:58:41 |
<Shelwien> | yes, but only at encoding, so that's ok | 2019-05-06 14:13:31 |
| the common optimization is like "fb" thing in lzma | 2019-05-06 14:13:52 |
| you can set the threshold match len for greedy encoding | 2019-05-06 14:14:07 |
| ie if len>32, just encode the match | 2019-05-06 14:14:17 |
| its also efficient for splitting the data into separately optimized segments | 2019-05-06 14:14:44 |
| so lzma manages to be streamable despite optimal parsing | 2019-05-06 14:15:37 |
| also, i'd suggest looking into BWT | 2019-05-06 14:16:33 |
| its the most efficient way to find all matches at once :) | 2019-05-06 14:16:58 |
<FunkyBob> | :) | 2019-05-06 14:17:22 |
| have just tried gating lazy parsing on match < 32... no noticeable difference for enwik8... trying 9 | 2019-05-06 14:21:54 |
| clearly I need to pick a different cutoff point... but, yah, I see the potential | 2019-05-06 14:33:50 |
<Shelwien> | there's also usually a depth limit on hash chain ("mc" in lzma) | 2019-05-06 14:35:17 |
<FunkyBob> | currently I search back as far as the window allows | 2019-05-06 14:43:06 |
<Shelwien> | yeah, but that can fail on redundant data | 2019-05-06 14:44:02 |
<FunkyBob> | "fail" how? | 2019-05-06 14:44:26 |
<Shelwien> | like, generate the file like "abcd" x 1000000 | 2019-05-06 14:44:29 |
| and test the speed | 2019-05-06 14:44:32 |
<FunkyBob> | right,o k | 2019-05-06 14:44:44 |
| degenerate case | 2019-05-06 14:44:47 |
<Shelwien> | unfortunately its surprisingly common | 2019-05-06 14:45:04 |
| just consider filetypes like bmp or wav | 2019-05-06 14:45:13 |
| or some log files | 2019-05-06 14:45:26 |
<FunkyBob> | hrm | 2019-05-06 14:46:06 |
<unic0rn> | or XPADDING in exe | 2019-05-06 14:47:12 |
| altough that's not that long | 2019-05-06 14:47:25 |
<Shelwien> | exes too, yeah | 2019-05-06 14:48:16 |
| they tend to have lots of tables and unrolled code | 2019-05-06 14:48:26 |
<FunkyBob> | ok, so, thinking out aloud... if I scan forward, and at each position store the offset of the best match, but at the _end_ [here + match len] store the len... I can traverse back easily by knowing how far back to look for the match | 2019-05-06 15:12:26 |
| and only update the length if it's longer than the preveious stored one | 2019-05-06 15:12:48 |
<Shelwien> | yes | 2019-05-06 15:14:33 |
| well, in general its slightly different | 2019-05-06 15:15:08 |
<FunkyBob> | then again, I can store the full match details | 2019-05-06 15:15:11 |
| at the offset of then end of the match | 2019-05-06 15:15:22 |
| Shelwien: oh? | 2019-05-06 15:15:26 |
<Shelwien> | 1) you have to replace the token if its codelen is smaller, rather than just match len | 2019-05-06 15:15:46 |
<FunkyBob> | yes | 2019-05-06 15:16:01 |
| but for now, all my tokens are the same size | 2019-05-06 15:16:09 |
<Shelwien> | for example, shorter match might have shorter code | 2019-05-06 15:16:10 |
| 2) you also have to count literal runs the same way | 2019-05-06 15:16:28 |
<FunkyBob> | shall we say "only update if it's of a greater saving" | 2019-05-06 15:16:33 |
<Shelwien> | yes | 2019-05-06 15:16:45 |
| its basically recursion | 2019-05-06 15:17:27 |
<FunkyBob> | interesting... lz4 works on a max of 4MB blocks | 2019-05-06 15:18:07 |
*** danlock has joined the channel | 2019-05-06 16:08:16 |
<Shelwien> | hi | 2019-05-06 16:09:55 |
<danlock> | Greetings. I'm just looking around; I just came from the encode forums. | 2019-05-06 16:12:07 |
<FunkyBob> | o/ | 2019-05-06 16:12:50 |
| I'm playing around with possible lit/mat/offset encodings... find something decent that's not just "LZ4 again" | 2019-05-06 16:14:44 |
<danlock> | :) I appreciate the welcome. I don't have any compression thoughts or questions right now, though. @FunkyBob good luck! | 2019-05-06 16:15:10 |
<Shelwien> | FunkyBob: btw, how about making a lookup table for actual len values? | 2019-05-06 16:16:00 |
| danlock: it doesn't really have to be compression | 2019-05-06 16:16:24 |
| we just had a discussion about AI and quantum mechanics :) | 2019-05-06 16:16:42 |
<FunkyBob> | Shelwien: hrm | 2019-05-06 16:16:56 |
<Shelwien> | like, all len values up to 16 or so, and then something exponent-based | 2019-05-06 16:17:40 |
| and pack the len to 5 bits or so, the use saved bits for flags | 2019-05-06 16:18:11 |
<danlock> | It's good to know I'm in here with like-minded people. | 2019-05-06 16:18:49 |
<FunkyBob> | Shelwien: I wonder how much I'd lose by lowering match resolution as length grows... | 2019-05-06 16:20:16 |
<Shelwien> | based on freqtables that i made - not much | 2019-05-06 16:20:51 |
<FunkyBob> | like you were daying, exponent based.... encode unit + multiplier | 2019-05-06 16:21:06 |
| so 0001 would mean (1 * 1), but 0110 would be (4 * 2) | 2019-05-06 16:21:53 |
<Shelwien> | well, you'd need all short lens, up to 16 or so | 2019-05-06 16:22:12 |
| but then it doesn't really matter whether its 32 or 33 | 2019-05-06 16:22:28 |
| especially with added support for rep-matches | 2019-05-06 16:22:42 |
<danlock> | If the len were packed to ~5 bits, would that give you 3 bits for flags or 11? I guess I'm thinking of 8-bit bytes, which would mean your uncompressed len value would comprise two bytes, right? | 2019-05-06 16:27:35 |
<Shelwien> | his current format is like this: | 2019-05-06 16:28:05 |
<FunkyBob> | danlock: yeah, for now I'm trying to stick with byte-aligned | 2019-05-06 16:28:08 |
<Shelwien> | 0LLLLLLL = literal run | 2019-05-06 16:28:16 |
| 1LLLLLLL DDDDDDDD DDDDDDDD = match for 64k window | 2019-05-06 16:28:43 |
| and a known useful thing is rep-match | 2019-05-06 16:29:09 |
* danlock nods | 2019-05-06 16:29:38 |
| that is, a match with same distance as previous match | 2019-05-06 16:29:39 |
| 10LLLLLL = rep-match | 2019-05-06 16:30:22 |
| 11LLLLLL DDDDDDDD DDDDDDDD = normal match | 2019-05-06 16:30:34 |
<FunkyBob> | danlock: my current conclusion as to why I lose to lz4 is they imply the match/literal field by _always_ having (literal run, match, offset) | 2019-05-06 16:30:43 |
| danlock: and can encode short runs in 4bits, where I can't | 2019-05-06 16:31:02 |
| Shelwien: will try | 2019-05-06 16:31:59 |
<Shelwien> | another interesting thing | 2019-05-06 16:32:29 |
<danlock> | Putting something into the spec is one way of reducing the number of things for the encoder to keep track of with extra bits | 2019-05-06 16:32:31 |
<Shelwien> | is that although rep-match is a simple straight optimization | 2019-05-06 16:32:53 |
| which is also the reason for the near-useless "rep stack" in many codecs | 2019-05-06 16:33:26 |
| but the main compression improvement from rep-matches actually comes from fuzzy matches | 2019-05-06 16:34:01 |
| for example, book1 file has a <P ##>\n page number markup | 2019-05-06 16:34:32 |
| so lzma encodes it as match:"<P 5" literal:"2" rep-match:">\n" | 2019-05-06 16:35:10 |
| and since you don't yet have large dictionary or complex match-finding structures | 2019-05-06 16:35:48 |
| it means you can continue comparing strings after first mismatch | 2019-05-06 16:36:08 |
| ie _properly_ look for fuzzy matches | 2019-05-06 16:36:24 |
| where most coders just use them accidentally | 2019-05-06 16:36:42 |
| when its the only choice | 2019-05-06 16:36:51 |
<danlock> | passive vs. active | 2019-05-06 16:37:53 |
<Shelwien> | danlock: which color scheme do you like more? http://nishi.dreamhosters.com/chanloglist.pl http://nishi.dreamhosters.com/chanloglist.pl?&cs2 | 2019-05-06 16:38:09 |
<danlock> | I like chanloglist.pl better than chanloglist.pl?&cs2, but people with dimmer displays might prefer the latter. | 2019-05-06 16:39:50 |
<Shelwien> | :) | 2019-05-06 16:40:23 |
<FunkyBob> | think I'll do some more debug analysis to see what rep-offset might gain me... as well as match sizes | 2019-05-06 16:42:02 |
| and lit lens | 2019-05-06 16:42:05 |
<Shelwien> | might be better to have optimal parsing first | 2019-05-06 16:42:52 |
<unic0rn> | FunkyBob: just curious, what's your current code size? | 2019-05-06 16:43:28 |
<Shelwien> | 13k with mingw | 2019-05-06 16:45:36 |
<unic0rn> | nah, i meant the source code | 2019-05-06 16:45:52 |
<Shelwien> | around 7-8k | 2019-05-06 16:46:06 |
<unic0rn> | i've mentioned 14k recently i think, but i was wrong. saw a status line, didn't realize it's from saving another file. actual encoder-in-progress is below 6k so far | 2019-05-06 16:46:32 |
<FunkyBob> | 14464 stripped on x86-64 linux | 2019-05-06 16:46:40 |
<unic0rn> | either i'm slow or turning it into rocket science. | 2019-05-06 16:46:49 |
<FunkyBob> | 5728 for the C file | 2019-05-06 16:47:22 |
<unic0rn> | there's no entropy coder? | 2019-05-06 16:48:09 |
<FunkyBob> | no | 2019-05-06 16:48:13 |
<unic0rn> | not yet? :) | 2019-05-06 16:48:50 |
<FunkyBob> | oh, and a 1498 byte "main" | 2019-05-06 16:49:05 |
| no, not yet :) | 2019-05-06 16:49:08 |
| I did find an interesting algorithm for building prefix codes [like Huffman] that makes it fairly cheap to update to "near optiomal" encodings | 2019-05-06 16:49:43 |
<Shelwien> | adaptive bitcoding is totally useless | 2019-05-06 16:50:22 |
| its always slower than adaptive arithmetic coding | 2019-05-06 16:50:38 |
| and compresses worse | 2019-05-06 16:50:44 |
<unic0rn> | you may check deflate's wiki page, there's a short description of its huffman coding | 2019-05-06 16:51:15 |
| i would just go with range though | 2019-05-06 16:51:19 |
<Shelwien> | deflate uses static huffman coding, not adaptive | 2019-05-06 16:51:39 |
<unic0rn> | i know | 2019-05-06 16:51:50 |
<Shelwien> | saves a length table for a block in block header | 2019-05-06 16:51:59 |
| with ~64k blocks usually | 2019-05-06 16:52:06 |
<unic0rn> | well, since adaptive huffman is useless, that's i guess the only option as far as huffman is concerned | 2019-05-06 16:52:49 |
<Shelwien> | well, did you read about fritcode? | 2019-05-06 16:53:18 |
<unic0rn> | now i did. https://encode.ru/threads/1116-Charles-Blooms-new-huffman-optimisation | 2019-05-06 16:55:02 |
<Shelwien> | yes | 2019-05-06 16:55:14 |
| its basically two-stage entropy coding | 2019-05-06 16:55:47 |
| blockwise-static code optimized for data | 2019-05-06 16:56:20 |
<unic0rn> | i was considering fritcode earlier, that is, had that idea just kinda missing the point, i didn't think about adding another stage of entropy coding after that | 2019-05-06 16:56:32 |
<Shelwien> | and fully static code to compress the first code, which has intentionally skewed bit probabilities | 2019-05-06 16:57:08 |
<unic0rn> | but since fritcode is very simple, i was thinking about using one of the codes as an escape code | 2019-05-06 16:57:09 |
| for example, "set fritcode for last decoded symbol to 0" | 2019-05-06 16:57:42 |
<Shelwien> | that's common - eg. deflate has a special code for end-of-block | 2019-05-06 16:57:50 |
<unic0rn> | because rearranging this is far easier than rebuilding whole huffman tree | 2019-05-06 16:57:55 |
| so it could be adaptive | 2019-05-06 16:58:32 |
| and fast at the same time | 2019-05-06 16:58:43 |
<Shelwien> | uh, no | 2019-05-06 16:58:44 |
| optimal fritcode is harder to build than huffman | 2019-05-06 16:59:05 |
| but same block-wise static as huffman | 2019-05-06 16:59:35 |
| can get you better compression | 2019-05-06 16:59:42 |
| since with fritcode, symbol codes are not limited with 1/2^x | 2019-05-06 17:00:00 |
| probabiilities, i mean | 2019-05-06 17:00:11 |
<unic0rn> | well, encoder wouldn't be exactly simple, because it would have to make quite a few estimations to find the most optimal route for switching codes | 2019-05-06 17:01:17 |
| but decoding would be fast | 2019-05-06 17:01:22 |
<Shelwien> | well, you can try | 2019-05-06 17:02:04 |
<unic0rn> | should be clearly faster than adaptive huffman | 2019-05-06 17:02:07 |
| nah. i see no point, unless one's bound to not using range nor arithmetic, or ans | 2019-05-06 17:02:28 |
<FunkyBob> | sorry, how is fritcode different to any other prefix scheme? | 2019-05-06 17:02:32 |
<Shelwien> | but adaptive bitcode wasn't ever used in any practical coder in last 20 years or so | 2019-05-06 17:02:51 |
<FunkyBob> | that post doesn't appear to explain it ... at all | 2019-05-06 17:03:17 |
<Shelwien> | its simply inefficient because of current cpu design | 2019-05-06 17:03:18 |
| FunkyBob: as i said, its a two-stage entropy coding scheme | 2019-05-06 17:03:45 |
| the fritcode itself is like huffman, but based on skewed bit probabilities | 2019-05-06 17:04:10 |
| like 0=1/3, 1=2/3 instead of both 1/2 | 2019-05-06 17:04:24 |
| which results in longer codes and better precision of symbol probability approximation | 2019-05-06 17:04:49 |
| and then we just need a 2nd stage code for bytes of 1st code | 2019-05-06 17:05:10 |
<unic0rn> | inefficient... lets talk about inefficient... elapsed time: 00:00:17.203 - current processing of 256k block, far from done, single core | 2019-05-06 17:05:24 |
<Shelwien> | which can be static huffman, or arithmetic, or anything | 2019-05-06 17:05:41 |
| bits with skewed probability are packed to bytes, also with known probability distribution | 2019-05-06 17:06:51 |
<FunkyBob> | unic0rn: using which algorithms? | 2019-05-06 17:08:17 |
<unic0rn> | that information is classified ;) | 2019-05-06 17:08:36 |
| in short: my own | 2019-05-06 17:08:41 |
<FunkyBob> | ok | 2019-05-06 17:09:09 |
| well, here's hoping they get great ratios... for that much time :P | 2019-05-06 17:09:21 |
<unic0rn> | for now, there's hoping it'll work at all | 2019-05-06 17:09:42 |
| gotta finish stage 2 to have something possible to test with decompression, then add stage 3 - entropy coder | 2019-05-06 17:10:22 |
<FunkyBob> | that's a good first goal :) | 2019-05-06 17:10:23 |
<unic0rn> | for now i'm drowning in statistics from stage 1 | 2019-05-06 17:10:46 |
<Shelwien> | NNCP basic speed is 100 bytes/s :) | 2019-05-06 17:11:17 |
<unic0rn> | well, creating the slowest encoder around isn't exactly my goal, so i'll let it have that :P | 2019-05-06 17:12:00 |
<Shelwien> | because of that he had to add MT and stream interleaving right away | 2019-05-06 17:13:15 |
| got it 10x faster with that | 2019-05-06 17:13:22 |
| i guess 1kb/s is much better :) | 2019-05-06 17:13:35 |
<unic0rn> | at least i'm faster than that. and MT will help a lot, but that's left for the end | 2019-05-06 17:14:21 |
| well, in the end i'll probably rewrite half of it in assembler | 2019-05-06 17:14:47 |
<danlock> | There are bound to be people who don't mind slow encoders, but if faster encoders with similar compression ratios exist, a slow encoder might not be the way to go. rewriting in ASM will probably improve the speed and decrease the size, but be careful not to introduce new bugs. | 2019-05-06 17:19:55 |
<unic0rn> | i guess i wrote more crazy stuff in asm | 2019-05-06 17:22:00 |
<danlock> | asm is great for reducing size, but it might limit the program to a specific type of processor | 2019-05-06 17:23:11 |
<Shelwien> | asm is very inconvenient for compression research | 2019-05-06 17:23:55 |
| its one thing when you write a codec by format specification | 2019-05-06 17:24:21 |
| but when you have to actually experiment with it - tweak various parameters, precision etc | 2019-05-06 17:24:49 |
| asm just doesn't allow to do it | 2019-05-06 17:25:08 |
<danlock> | If you're doing active research, a high-level language is probably better because it enables you to change specific parts of the encoder for testing more easily. Only after you've found the best possible way to do things should you work on optimization, IMO. | 2019-05-06 17:27:42 |
| Yeah. what Shelwien said. ;) | 2019-05-06 17:28:36 |
<Shelwien> | in compression there's a very little choice of "high-level language" too | 2019-05-06 17:28:58 |
| basically from C and C++ | 2019-05-06 17:29:08 |
<unic0rn> | not exactly, danlock. i've got 2 definitions i knew for a fact won't change no matter what. rewrote them in asm, 3x speed increase. | 2019-05-06 17:29:33 |
| kinda helps with testing. | 2019-05-06 17:29:38 |
| and those were basically oneliners, more or less. | 2019-05-06 17:29:59 |
<Shelwien> | yeah, C++ compiler would do it automatically | 2019-05-06 17:30:06 |
<unic0rn> | i have my doubts about that. it would be faster, agreed on that, but those are pretty specific operations | 2019-05-06 17:31:01 |
<danlock> | ...and that's why I have so few comments to make on the encode forums, which is why the fact I've been told I don't have permission to reply (the couple of times I've tried to reply) on the encode forums isn't a very big deal. | 2019-05-06 17:31:04 |
<Shelwien> | uh, you probably didn't get the confirmation email | 2019-05-06 17:31:32 |
| what's your login on the forum? same? | 2019-05-06 17:31:43 |
<unic0rn> | of course there are intrinsics, but then that's hardly not optimizing stuff until the very end | 2019-05-06 17:31:58 |
<danlock> | yeah.. same | 2019-05-06 17:32:10 |
<Shelwien> | yeah, "awaiting email confirmation" | 2019-05-06 17:32:41 |
<danlock> | at least it (last time I checked) allows me to mark posts as helpful | 2019-05-06 17:32:42 |
<Shelwien> | switched to normal, you should be able to post now | 2019-05-06 17:33:12 |
| C++ compiler can do function inlining and global optimizations | 2019-05-06 17:34:59 |
| so in many cases, the whole algorithm is just inlined in main() | 2019-05-06 17:35:27 |
<danlock> | Thank you very much. Because you guys (the frequent posters, called "lurkers" by FunkyBob, IIRC, are so much more experienced and active than I am, though, I /very seldom/ have anything to say. Sometimes, though, I feel like I have a valid point. | 2019-05-06 17:36:02 |
<Shelwien> | and then, it can also do expression optimizations when there're constants | 2019-05-06 17:36:02 |
| and there's template syntax | 2019-05-06 17:36:27 |
* danlock thinks.... . o O ( template syntax? ) | 2019-05-06 17:37:02 |
| which lets you force the compiler to generate the code for multiple choices of specific constant value | 2019-05-06 17:37:06 |
<danlock> | ahh. | 2019-05-06 17:37:22 |
<Shelwien> | so you can get same performance like with constants | 2019-05-06 17:37:28 |
| but still choose their values in runtime | 2019-05-06 17:37:41 |
<danlock> | Man, it's been too long since I was actively coding | 2019-05-06 17:37:42 |
<Shelwien> | https://github.com/rarten/ooz/blob/master/compr_kraken.cpp#L583 | 2019-05-06 17:38:48 |
| as example | 2019-05-06 17:38:55 |
<unic0rn> | yeah, we had that discussion. thing is, in this case i don't think C++ compiler would magically figure out what i'm trying to do and use proper asm instruction on its own. and using intrinsics just throws the "compiler will do that for you" out of the window. in this case, i don't think it would. and sure, whole code in general, compiled by C++ compiler, would be faster. but as it is, it's much | 2019-05-06 17:39:09 |
| easier to maintain for me, because of tons of optimizations built in by design to make it work at reasonable speed at all | 2019-05-06 17:39:09 |
| optimizations that would look much messier in C++, imho | 2019-05-06 17:39:46 |
<danlock> | I took a Uni class in ANSI C and another C class later on (er... no, that was a Calculus class in which the instructor gave me his old copy of a book about coding in C++) | 2019-05-06 17:39:52 |
<Shelwien> | sure, proper manually written code is better than what compilers can do | 2019-05-06 17:40:16 |
<unic0rn> | i'll optimize it further later on, it'll be fast enough regardless of the language | 2019-05-06 17:40:20 |
<Shelwien> | but then, you can usually only support a single target | 2019-05-06 17:40:40 |
| while there's at least 4 now | 2019-05-06 17:40:59 |
<danlock> | :highfive: unic0rn | 2019-05-06 17:41:01 |
<Shelwien> | SSE4/AVX/AVX2/AVX512 | 2019-05-06 17:41:05 |
| well, language syntax is important | 2019-05-06 17:41:36 |
<danlock> | AVX512. I think this is the first I've heard of that. | 2019-05-06 17:41:39 |
<Shelwien> | uh, my cpu has it? | 2019-05-06 17:41:50 |
| language syntax strongly affects your choices | 2019-05-06 17:42:33 |
| you won't do stuff that is hard to write in your chosen language | 2019-05-06 17:42:55 |
<danlock> | what cpu do you have? | 2019-05-06 17:42:58 |
<unic0rn> | luckily, forth has almost none | 2019-05-06 17:43:02 |
<Shelwien> | intel 7820X | 2019-05-06 17:43:14 |
| also, even with C++ I sometimes have to write custom preprocessors | 2019-05-06 17:44:01 |
| that is, scripts that generate c++ code from custom syntax | 2019-05-06 17:44:40 |
| here's an example: https://encode.ru/threads/3070-mod_CM-another-paq-submodel?p=59503&viewfull=1#post59503 | 2019-05-06 17:45:31 |
<danlock> | Mine's an i5 4670K which runs at 4192MHz under load. CPU-Z v1.88.0.x64 says it supports MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX, AVX2, FMA3 | 2019-05-06 17:47:03 |
<unic0rn> | https://pastebin.com/raw/UMTf92Py | 2019-05-06 17:47:33 |
| as for redefining syntax | 2019-05-06 17:47:41 |
<Shelwien> | sure, but anything actually useful would be much harder | 2019-05-06 17:48:27 |
| like, how would you go about defining a custom float-point type in forth? | 2019-05-06 17:49:00 |
| like, you have a program that worked with lots of 32-bit floats | 2019-05-06 17:49:22 |
| then decided to save memory by reducing mantissa size to 8 bits | 2019-05-06 17:49:41 |
| and saving them as 16-bit values | 2019-05-06 17:49:53 |
<danlock> | FORTH is a language I remember existing LONG ago... but I don't know anything else about it. | 2019-05-06 17:49:59 |
<Shelwien> | its used in bitcoin | 2019-05-06 17:50:12 |
<FunkyBob> | seriously? | 2019-05-06 17:50:26 |
<Shelwien> | yes | 2019-05-06 17:50:32 |
<FunkyBob> | for what? | 2019-05-06 17:50:37 |
<unic0rn> | by rewriting stores and loads. 2 basic operations. assuming calculations can still run in single precision and all you want is storing them as half | 2019-05-06 17:50:45 |
<Shelwien> | custom transaction checking | 2019-05-06 17:50:46 |
<unic0rn> | that's basically rewriting 2 words, preferably in asm i guess | 2019-05-06 17:51:02 |
<Shelwien> | unic0rn: how about working with 32-bit and 16-bit floats in the same code? | 2019-05-06 17:51:32 |
<unic0rn> | then you need separate definitions based on precision, and preferably some conversion words as well | 2019-05-06 17:52:16 |
<Shelwien> | well, in C++ its transparent | 2019-05-06 17:52:38 |
| i can just add a class with operator definitions | 2019-05-06 17:53:02 |
| and change the type of some variables to new class | 2019-05-06 17:53:18 |
<unic0rn> | forth does nothing on its own. which, depending on what you want, may be a good thing | 2019-05-06 17:53:35 |
<Shelwien> | code of actual algorithm would remain exactly the same | 2019-05-06 17:53:38 |
| yes, i know | 2019-05-06 17:53:50 |
| but it lacks too much of that "syntax sugar" | 2019-05-06 17:54:10 |
<unic0rn> | well, in general, same here. you can change some basic blocks upon which the rest of the code is built, and it'll work like nothing happened | 2019-05-06 17:54:19 |
<Shelwien> | it may be possible to implement on your own, but why bother? | 2019-05-06 17:54:24 |
<danlock> | Shelwien: Are you suggesting ideas as a way of stimulating imagination and generating new ideas/thoughts/creativity to potentially find new methods for accomplishing what you'd like to do, or helping us to learn things you're already familiar with, or.. what? | 2019-05-06 17:56:02 |
| Whatever your answer, I like it. | 2019-05-06 17:56:14 |
| My uncle, a chemist (who is mostly focused on spectrography, but also other fascinating things), often asks me for ideas about how to do certain things and it's always been a mind-expanding thing for me. | 2019-05-06 17:58:03 |
<Shelwien> | well, getting feedback is frequently useful | 2019-05-06 17:59:05 |
| i tried just writing down my thoughts about random things | 2019-05-06 17:59:41 |
| but explaining stuff to an actual person works better | 2019-05-06 18:00:19 |
<danlock> | When I was learning about chemistry long ago, I learned (as most beginning chemistry students probably do) about atoms having protons and (usually) neutrons in their nucleus and electrons in their respective "shells" orbiting the nucleus, but it gets a lot more complicated than that. (I'm thinking of your having mentioned that you had been discussing AI and quantum mechanics) | 2019-05-06 18:01:16 |
<unic0rn> | as for missing different targets, in case of my compression it may not matter much. kinda doubt any critical parts will be easily vectorizable. it will benefit from MT, but from SIMD... kinda doubt it | 2019-05-06 18:01:33 |
<Shelwien> | unic0rn: you may be surprised... | 2019-05-06 18:02:04 |
| even basic stuff like memset/memcpy benefits from it | 2019-05-06 18:02:17 |
<danlock> | I usually hope that things I say can "spark" a train of thought in someone that will somehow generate an "aha!" moment ... | 2019-05-06 18:02:36 |
<unic0rn> | except that basic stuff isn't performance critical there | 2019-05-06 18:02:39 |
| what eats up the most time, are loops that are hardly vectorizable | 2019-05-06 18:03:14 |
<Shelwien> | still, new extensions add more registers at least | 2019-05-06 18:03:47 |
| so there're all kind of indirect effects when its done by compiler | 2019-05-06 18:04:01 |
<danlock> | unic0rn: can you say that basic stuff will never, under any circumstances, become performance-critical in the future? | 2019-05-06 18:04:04 |
<Shelwien> | but of course you can't do it manually | 2019-05-06 18:04:19 |
| danlock: its supposedly a solution for the hutter-prize contest | 2019-05-06 18:04:50 |
| there's a time limit too, but its a few hours, so most of things would be ok | 2019-05-06 18:05:22 |
<unic0rn> | danlock: when that basic stuff is called once per 256kB block, it simply doesn't matter | 2019-05-06 18:05:41 |
| because the rest is so much slower | 2019-05-06 18:05:59 |
<Shelwien> | and its true that in theory forth could reduce the decoder size | 2019-05-06 18:06:13 |
<danlock> | I was watching someone code in a game jam a few years ago and I was amazed at how much debugging is taken care of by the top level ... erm..... I can't think of the acronym right now... ...as the coder was writing the source. | 2019-05-06 18:06:20 |
<Shelwien> | which is also counted as part of result in HP | 2019-05-06 18:06:24 |
<unic0rn> | it most likely won't be submitted, assuming it beats the result. | 2019-05-06 18:07:14 |
<Shelwien> | although i kinda doubt that its easy to find a compact forth JIT or interpreter | 2019-05-06 18:07:15 |
<unic0rn> | there are a few, one can always write his own as well | 2019-05-06 18:07:57 |
<Shelwien> | yes, but whats their code size? | 2019-05-06 18:08:35 |
<danlock> | (his desktop was being streamed online... and I was remembering how much less "hand-holding" or "pointing out discrepancies" existed in Turbo Pascal in DOS back before my TBI (traumatic brain injury), which (the TBI) is why I'll often seem scatterbrained | 2019-05-06 18:09:09 |
| ) | 2019-05-06 18:09:15 |
<unic0rn> | the one i'm using, 36k | 2019-05-06 18:09:27 |
<danlock> | 36k of source code? (I presume so, since that's what you were asking about earlier) | 2019-05-06 18:10:12 |
<Shelwien> | so that's all you need to run HP decoder on a clean machine? | 2019-05-06 18:10:25 |
<unic0rn> | no, 36k forth binary | 2019-05-06 18:10:25 |
<danlock> | oh. (note to self: find an online FORTH reference) | 2019-05-06 18:10:59 |
<unic0rn> | it loads ntdll, kernel32 and kernelbase, so yeah | 2019-05-06 18:11:03 |
<Shelwien> | no bad i guess, but the size of phda9dec is 43636 | 2019-05-06 18:11:18 |
| system dlls are ok, but forth itself would be likely counted towards the entry size | 2019-05-06 18:11:47 |
<unic0rn> | as you know, you can throw a ton of forth code on top, and the binary won't grow much | 2019-05-06 18:12:06 |
| of course it would be | 2019-05-06 18:12:12 |
<Shelwien> | yeah, but 43k is the full decoder size atm | 2019-05-06 18:12:24 |
<unic0rn> | this implementation allows bundling of your own code with the base binary. it actually gets there in compiled form | 2019-05-06 18:12:31 |
| because it performs compilation on the fly | 2019-05-06 18:12:44 |
<Shelwien> | so you're handicapped with forth | 2019-05-06 18:12:54 |
<danlock> | Ah. I guess it's formally called merely "Forth" ... my bad | 2019-05-06 18:13:02 |
<unic0rn> | even if, not by much. compression improvement should be much bigger to matter at all | 2019-05-06 18:13:48 |
<Shelwien> | about forth in bitcoin: https://sourceforge.net/p/bitcoin/mailman/message/31397880/ | 2019-05-06 18:13:51 |
| uh, that's quite unlikely :) | 2019-05-06 18:14:15 |
<unic0rn> | also, i don't care much about HP i guess. it's mostly a benchmark | 2019-05-06 18:14:48 |
| unlikely... maybe, maybe not | 2019-05-06 18:15:11 |
<Shelwien> | sure, nncp won't win HP either, but might work as advertisement of fabrice's NN library, as expected | 2019-05-06 18:15:45 |
| so getting a good result could be useful for other stuff just as well | 2019-05-06 18:16:30 |
| but a good result would be 16M at least | 2019-05-06 18:16:48 |
| and even that requires a lot of work and multiple components | 2019-05-06 18:17:14 |
| the main problem is actually the memory limit though | 2019-05-06 18:18:32 |
<unic0rn> | well, when i'll finish stage 2, i'll be able to test it without entropy coder, then i'll know 1) if it works or did i screw up something, and 2) what's the estimated ratio, by throwing the result at gzip | 2019-05-06 18:18:50 |
| the memory limit is actually not my problem | 2019-05-06 18:19:08 |
| i've solved that one | 2019-05-06 18:19:16 |
<Shelwien> | better try paq8px for entropy estimation | 2019-05-06 18:19:16 |
| ok :) | 2019-05-06 18:20:19 |
| normally 1gb is barely enough to fit a bt4 match index for enwik8 | 2019-05-06 18:20:44 |
<danlock> | The extent to which you should take into account the memory limit is related to the number of testers of your code you want, because they have finite memory limits | 2019-05-06 18:20:56 |
<Shelwien> | and then there's no space left for statistics | 2019-05-06 18:20:56 |
| well, HP contest rules specify a lower memory limit than what is usually available these days | 2019-05-06 18:21:36 |
| (1gb) | 2019-05-06 18:21:39 |
<unic0rn> | take into account that i'm using 256k blocks, but those are configurable. i could perhaps go with up to 4mb or so | 2019-05-06 18:21:57 |
<Shelwien> | without solid compression the results would be even worse :) | 2019-05-06 18:22:30 |
| (ie whole file as one block) | 2019-05-06 18:22:50 |
<danlock> | Steve Gibson at grc.com is the only person I can think of right now who typically codes exclusively in ASM. However, as he's developed his free authentication protocol, SQRL, he's had to use other IDEs for implementation on platforms other than x86. | 2019-05-06 18:24:08 |
<unic0rn> | nah. i'll just put quantum entangled receiver in the decompressor and make it slow enough so i can transmit the whole file by hand | 2019-05-06 18:24:15 |
| oh wait... it doesn't work that way | 2019-05-06 18:24:20 |
<Shelwien> | :) | 2019-05-06 18:24:52 |
<unic0rn> | :P | 2019-05-06 18:25:06 |
<Shelwien> | there's one tricky option that should available | 2019-05-06 18:25:19 |
| its in theory possible to use OS files as reference | 2019-05-06 18:25:41 |
<danlock> | It annoys a lot of people when I say something which seems like a comment out of nowhere and like it will derail the conversation, but it's usually a direct reaction to something I've just read wherever I'm posting. | 2019-05-06 18:25:41 |
<unic0rn> | danlock: i was toying with assembler on zx spectrum first. then i wrote custom video mode + graphics conventer entirely in asm on amiga 600. | 2019-05-06 18:26:00 |
| and then some 256b and similar demoscene stuff on pc | 2019-05-06 18:26:09 |
| all for fun | 2019-05-06 18:26:23 |
| Shelwien: i think that's forbidden | 2019-05-06 18:26:55 |
<Shelwien> | i don't think it is | 2019-05-06 18:28:16 |
<unic0rn> | you cannot use input from external sources, files included | 2019-05-06 18:28:24 |
<danlock> | unic0rn: that's nifty. I love demoscene stuff. What skilled coders can get out of chipsets introduced in the 1970s (like the Atari 8bit computers) and early 80s (Commodore 64/etc.) and PCs in the '90s | 2019-05-06 18:28:32 |
<unic0rn> | system libraries are allowed for calls like i/o, that's it | 2019-05-06 18:28:33 |
| also, Matt will run everything under Linux anyway | 2019-05-06 18:28:52 |
| and complain if it doesn't work, despite being an exe file | 2019-05-06 18:29:14 |
<Shelwien> | maybe, but linux also has plenty of system files, with plenty of text inside | 2019-05-06 18:29:16 |
<danlock> | I forgot to finish my sentence. sigh. What skilled coders can get... ...is really amazing. | 2019-05-06 18:29:24 |
<unic0rn> | yeah. good luck accessing those under wine, if he removed the / mapping | 2019-05-06 18:29:42 |
<danlock> | Mahoney? | 2019-05-06 18:29:49 |
<Shelwien> | yes | 2019-05-06 18:29:57 |
| paq8px already has an option to train the model with its own exe | 2019-05-06 18:30:49 |
| and i don't think that usual system files is explicitly forbidden | 2019-05-06 18:31:11 |
| but its actually far from easy to make use of it | 2019-05-06 18:31:25 |
<danlock> | I haven't seen much of what's going on with him since he stopped actively participating on encode.ru and his domain hasn't been as frequently updated either. It seemed like it coincided with a hurricane that hit the south Florida area a while back. | 2019-05-06 18:31:50 |
<unic0rn> | it's actually nearly impossible. would work only with specific files in specific versions | 2019-05-06 18:31:56 |
| he's running. | 2019-05-06 18:32:10 |
| supposedly a lot. | 2019-05-06 18:32:17 |
<Shelwien> | not necessarily | 2019-05-06 18:32:20 |
<danlock> | isn't that always the case, though? He runs a lot. | 2019-05-06 18:32:41 |
<Shelwien> | for example, you can locate useful text strings in system files | 2019-05-06 18:32:50 |
| and include their hashes in decoder | 2019-05-06 18:32:59 |
<unic0rn> | man pages | 2019-05-06 18:33:08 |
| free dictionary | 2019-05-06 18:33:10 |
<Shelwien> | then find them by hashes on target system | 2019-05-06 18:33:14 |
| yeah | 2019-05-06 18:33:15 |
<danlock> | the northern hemisphere was hit by heavy weather this past couple of seasons, though... not that it necessarily impacted running when in the lower latitudes | 2019-05-06 18:34:39 |
<Shelwien> | danlock: what changed is that he retired from work, and decided to participate in all marathons in US | 2019-05-06 18:35:10 |
| so now has no time for programming | 2019-05-06 18:35:20 |
| https://www.facebook.com/mattmahoneyfl | 2019-05-06 18:36:10 |
<danlock> | ahh. I knew about the retirement, but not about the desire to participate in all marathons. I know of three marathons (and an Iron Man competition, which he might not be interested in) that recently occurred in my region of the western US, and if he wants to run in them all it'll take many years just because many of them happen in different cities and states on the same weekends or same days. He's smart enough to determine an opt | 2019-05-06 18:38:17 |
<Shelwien> | well, it seems to be exactly what he does | 2019-05-06 18:39:45 |
<danlock> | If the desire is to participate in every single marathon, that's a much bigger thing to take on than just a walk/run/bike from coast to coast | 2019-05-06 18:39:46 |
<Shelwien> | also he found a woman there :) | 2019-05-06 18:39:50 |
<danlock> | that'll reduce his availability by a lot even if she's participating in all the marathons with him | 2019-05-06 18:40:34 |
<Shelwien> | well, he somehow still finds time to post on quora | 2019-05-06 18:42:15 |
* danlock laments his own lack of energy and its debilitating effect on his life, which is more sleep than life some days, and more "awake but cognitively-absent" the rest of the time. | 2019-05-06 18:42:25 |
<danlock> | before the TBI, thinking never, ever tired me. after I awoke from the 14-day coma, it did and I learned that my physical energy and cognitive energy are separate, and that the higher-level thinking goes away a lot more quickly than the rudimentary "alert but not so aware" state of living. | 2019-05-06 18:45:10 |
| I'm not looking for sympathy. I'm merely stating my personal experience. | 2019-05-06 18:46:07 |
| prior to the TBI, my memory was very good, and maybe even superb, but I didn't remember everything. Since then, I remember fewer events from each discrete time period, whatever you define that to be, and, as a result, time seems to pass much more quickly for me. *shrug* | 2019-05-06 18:48:07 |
<Shelwien> | well, I was surprised by Matt's productivity at times | 2019-05-06 18:48:28 |
| like when he started zpaq project | 2019-05-06 18:48:37 |
<danlock> | some people have so much energy | 2019-05-06 18:48:39 |
<Shelwien> | but then, it was based on some pretty weird assumptions | 2019-05-06 18:49:18 |
| so despite investing a lot of work in it, its not really getting more popular | 2019-05-06 18:49:42 |
<danlock> | Sometimes, weird assumptions are where the best results are found at times. | 2019-05-06 18:49:47 |
| doh. sorry for the redundancy... heh. | 2019-05-06 18:50:04 |
<Shelwien> | unlike stuff like lz4, which needs 20 lines of C to decode :) | 2019-05-06 18:50:06 |
<danlock> | brb | 2019-05-06 18:50:29 |
<Shelwien> | well, the main feature in zpaq is forward compatibility | 2019-05-06 18:50:40 |
| the idea was actually introduced in rar | 2019-05-06 18:50:54 |
| and already proved to be useless there | 2019-05-06 18:51:14 |
| but Matt still decided to make his own version | 2019-05-06 18:51:26 |
| so now we have a zpaq config for pi number generation | 2019-05-06 18:52:01 |
| but its not fast, nor has good compression, nor good archive management syntax | 2019-05-06 18:52:59 |
| and even Matt seemingly stopped maintaining it | 2019-05-06 18:53:39 |
<danlock> | How does that work (the pi generator)? I imagine it relies on zpaq's recursive addition from each version to the next of the archive. | 2019-05-06 18:53:49 |
<Shelwien> | zpaq has a simple programming language in its config | 2019-05-06 18:54:33 |
<danlock> | I checked that other C book I have from that calculus class... it's about C, not C++ | 2019-05-06 18:54:59 |
<Shelwien> | the idea is that a new compression algorithm can be written in that language | 2019-05-06 18:55:03 |
| and then old zpaq version would be able to decode the archive created by new version with new algorithms | 2019-05-06 18:55:37 |
<danlock> | Have you experimented with that much? | 2019-05-06 18:55:40 |
<Shelwien> | because it would add the config to archive | 2019-05-06 18:55:56 |
| https://encode.ru/threads/1211-Compressing-pi | 2019-05-06 18:56:11 |
<danlock> | I'm glad he put it in the public domain or open sourced it, at least, so it's not going to remain in a backroom at Dell | 2019-05-06 18:57:10 |
<unic0rn> | danlock: https://i.warosu.org/data/g/img/0563/02/1472388528965.png - Shelwien will probably disagree ;) | 2019-05-06 18:57:11 |
<Shelwien> | i think it was his personal project, not related to dell | 2019-05-06 18:57:52 |
<danlock> | I thought that C was more portable than C++ and has become more of a universal fallback as time has passed | 2019-05-06 18:58:16 |
<Shelwien> | since its mostly portable via gcc anyway | 2019-05-06 18:58:41 |
| there'd be C++ when there's C | 2019-05-06 18:58:47 |
| and I agree that C++ standard library is bloated | 2019-05-06 18:59:05 |
| but nobody makes you use it | 2019-05-06 18:59:12 |
| most of C code can be compiled by C++ compiler anyway | 2019-05-06 18:59:33 |
| and C++ has new features which make programming more efficient | 2019-05-06 18:59:56 |
| like not having to write ptr-> before every variable | 2019-05-06 19:00:14 |
| or not having to copy-paste 10 versions of a function tuned to different constant values | 2019-05-06 19:00:47 |
<unic0rn> | you really do love your templates | 2019-05-06 19:01:04 |
<Shelwien> | yes; also coroutines | 2019-05-06 19:01:19 |
<danlock> | Do some of those features require more commenting, or will they remain readable to anyone who understands C++ code? | 2019-05-06 19:01:23 |
| templates can be very useful (remarked the redundant one) | 2019-05-06 19:02:05 |
<Shelwien> | the syntax is usually pretty simple | 2019-05-06 19:02:27 |
| but properly using it requires some experience | 2019-05-06 19:02:45 |
| since there're weird limits in places | 2019-05-06 19:03:06 |
| as advanced example you can see this: https://encode.ru/threads/3077-C-constexpr-experiments | 2019-05-06 19:04:09 |
<danlock> | (re: zpaq's integrated encoder) I guess we need someone with Matt's endurance to take that idea and run with it (no pun intended) | 2019-05-06 19:08:06 |
<Shelwien> | no, its just a failed idea | 2019-05-06 19:09:03 |
| like, its impossible to fit a new-style codec, like zstd, into zpaq | 2019-05-06 19:09:44 |
| because Matt decided that his entropy coding and file format are already perfect | 2019-05-06 19:10:40 |
| it'd be much better to provide a binary plugin interface | 2019-05-06 19:11:45 |
| and maybe asymmetric cryptography for plugin signing | 2019-05-06 19:12:13 |
<danlock> | That's unfortunate. I thought he would be open to improvements/insertions of newer methods into that. | 2019-05-06 19:12:22 |
<Shelwien> | he expected that it would become popular, and people would add support for everything necessary on their own | 2019-05-06 19:13:15 |
| its possible, but | 2019-05-06 19:13:39 |
| (1) too much work because zpaql is basically limited asm | 2019-05-06 19:13:56 |
<danlock> | If he weren't trying to accomplish his running feat, I wouldn't be surprised if he changed zpaq to allow that. I know it's been changed before in ways that made it incompatible with prior versions | 2019-05-06 19:14:18 |
<Shelwien> | (2) people don't like to give away their coders | 2019-05-06 19:14:38 |
<danlock> | I don't know whether those were merely bugfixes, though | 2019-05-06 19:14:40 |
<Shelwien> | (3) progress didn't stop at Matt's entropy coder | 2019-05-06 19:15:09 |
<danlock> | unfortunately, compression time and decompression time are not asymmetric with zpaq | 2019-05-06 19:16:02 |
<Shelwien> | they really are | 2019-05-06 19:16:38 |
| there's BWT and some kind of LZ even in standard version | 2019-05-06 19:17:01 |
<danlock> | oh? I thought it was about the same time for each | 2019-05-06 19:17:03 |
<Shelwien> | BWT is asymmetric, it doesn't need sorting on decode | 2019-05-06 19:17:21 |
| https://github.com/pothos/zpaqlpy | 2019-05-06 19:17:37 |
<danlock> | so not strictly symmetric, but if you use a high-compression mode, won't it take nearly as long to decomp as it did to compress? | 2019-05-06 19:19:07 |
<Shelwien> | well, CM _is_ symmetric | 2019-05-06 19:19:41 |
| and CM _does_ provide the best compression | 2019-05-06 19:20:02 |
| but if you want fast decompression, just don't use it? | 2019-05-06 19:20:17 |
<danlock> | yeah. | 2019-05-06 19:20:59 |
| I've been using rz with the sfx creator provided by ScottX (or somebody at encode) to compress some files more than 7z will, because that includes the encoder with the file. The next version might not be compatible with what was last released by Christian, so that type of compression is needed. | 2019-05-06 19:27:00 |
| ...if I want to ensure I can extract what I compressed... | 2019-05-06 19:27:55 |
<Shelwien> | well, Christian never does open source, so you're open of luck here :) | 2019-05-06 19:28:57 |
| you can try lzna/kraken i guess, since it was decompiled not long ago | 2019-05-06 19:29:24 |
| https://encode.ru/threads/3102-Oodle-compressor-open-source?p=59982&viewfull=1#post59982 | 2019-05-06 19:29:51 |
<danlock> | I'll probably just do what I've been doing until Christian can find time to release his next binary (he indicated it has better compression for some things), unless something better from Bulat Ziganshin someone else comes along first. | 2019-05-06 19:33:24 |
| *or someone else | 2019-05-06 19:33:51 |
<Shelwien> | sure | 2019-05-06 19:34:05 |
| https://encode.ru/threads/2989-exepack?p=57682&viewfull=1#post57682 | 2019-05-06 19:35:10 |
<danlock> | Igor Pavlov increased compression in 18.05 for a lot of my older (and some recent) .7z files (decompressed and recompressed) | 2019-05-06 19:35:33 |
| That's great, and RAR is a good codec for a lot of things, but since updated versions of 7z started compressing better for types of files at which RAR was previously the very best, it's mostly on my PC just for quick path-aware recompressions of ZIP files and stuff. | 2019-05-06 19:39:37 |
| (files I'll usually go back to later and recompress using something which takes longer but will result in a smaller file) | 2019-05-06 19:40:30 |
<Shelwien> | well, rar has its own better sfx, its just a demo | 2019-05-06 19:41:09 |
<danlock> | I wasn't talking specificially about sfx archives, but yeah.. and I think Peazip includes some pre-compressed sfx generators that will make smaller sfx too. | 2019-05-06 19:42:54 |
| Yikes. What I'm typing is starting to sound ... like I'm tired and/or hungry. time for some food. thanks for the conversation. :) | 2019-05-06 19:48:39 |
<Shelwien> | :) | 2019-05-06 19:49:04 |
*** danlock has left the channel | 2019-05-06 21:36:39 |
*** danlock has joined the channel | 2019-05-06 21:39:20 |
<danlock> | info | 2019-05-06 21:46:17 |
<unic0rn> | ERROR: command not found. please upgrade to WINZIP .007 | 2019-05-06 21:50:15 |
<danlock> | :) | 2019-05-06 21:52:20 |
*** danlock has left the channel | 2019-05-06 23:49:44 |
*** maniscalco_ has joined the channel | 2019-05-07 01:38:43 |
<maniscalco_> | quiet around here tonight, huh ? | 2019-05-07 01:42:12 |
*** maniscalco_ has left the channel | 2019-05-07 02:11:41 |
<unic0rn> | and left. | 2019-05-07 02:26:28 |
| well, number one rule when you wanna get some coding done: stop looking at IRC | 2019-05-07 02:26:43 |
<FunkyBob> | heh | 2019-05-07 02:35:35 |
<unic0rn> | i'm coding. i'm not looking. | 2019-05-07 02:54:42 |
<FunkyBob> | sure sure :P | 2019-05-07 02:57:57 |
<Shelwien> | https://encode.ru/threads/3110-gcc-9-1-1-released?goto=newpost | 2019-05-07 04:47:12 |
| did a benchmark, somehow gcc82 has the best results? | 2019-05-07 05:42:23 |
| except for PGO which is troublesome to use in gcc | 2019-05-07 05:42:42 |
*** Jibz has joined the channel | 2019-05-07 08:08:47 |
| hi | 2019-05-07 08:08:55 |
<Jibz> | hey :) | 2019-05-07 08:09:05 |
<Shelwien> | what do you think would be the best token layout for LZ4-like coder? | 2019-05-07 08:10:08 |
| 64k window, best compression? | 2019-05-07 08:10:31 |
<Jibz> | probably depends on the type of data (text binary), one interesting aspect to me would be if it would be better to make a format that enables efficient optimal parsing (like lz4), or using things like repeat match offsets that improve compression but introduce dependencies | 2019-05-07 08:16:21 |
<Shelwien> | repeat matches are easy enough to handle with optimal parsing | 2019-05-07 08:17:04 |
| lzma does it, for example | 2019-05-07 08:17:20 |
<Jibz> | that would be for values of optimal that are not optimal then, right? :) | 2019-05-07 08:17:46 |
| I should probably say bit-optimal instead | 2019-05-07 08:18:22 |
<Shelwien> | dynamic programming approach should work for any sequential compression method | 2019-05-07 08:18:24 |
| the problem with perfect parsing opt for formats with entropy coding | 2019-05-07 08:19:12 |
| is either it being not sequential (static bitcode with len table in block header) | 2019-05-07 08:19:46 |
| or having to store a copy of all statistics per element of optimizer token array | 2019-05-07 08:20:44 |
| lzma "solves" that by only updating counters once per 128 bytes in optimizer | 2019-05-07 08:21:36 |
<Jibz> | yes, true | 2019-05-07 08:22:03 |
<Shelwien> | but i think it should be still possible to make a perfect version | 2019-05-07 08:22:03 |
| by storing stat diffs per element | 2019-05-07 08:22:20 |
| i already did optimal parsing with entropy coding in cdm | 2019-05-07 08:22:46 |
| so it would be interesting to port it to lzma sometime | 2019-05-07 08:23:16 |
| btw, this is the element struct from lzma: | 2019-05-07 08:24:57 |
| struct COptimal { | 2019-05-07 08:24:59 |
| uint price; | 2019-05-07 08:24:59 |
| uint state; | 2019-05-07 08:24:59 |
| uint prev1IsChar; | 2019-05-07 08:24:59 |
| uint prev2; | 2019-05-07 08:24:59 |
| uint posPrev2; | 2019-05-07 08:24:59 |
| uint backPrev2; | 2019-05-07 08:25:00 |
| uint posPrev; | 2019-05-07 08:25:00 |
| uint backPrev; | 2019-05-07 08:25:01 |
| uint backs[LZMA_NUM_REPS]; | 2019-05-07 08:25:01 |
| }; | 2019-05-07 08:25:02 |
| so it tracks all 4 rep-distances | 2019-05-07 08:25:49 |
<Jibz> | you could try out all different token layouts on a set of test file and see which gives the best compression on average? I assume people like charles, yann, and ilya already did this, but likely looking for a compromise between speed and ratio | 2019-05-07 08:29:52 |
<Shelwien> | i don't think they really tried | 2019-05-07 08:30:57 |
| i never heard of anybody making an universal parsing optimizer, which would work with any layout | 2019-05-07 08:31:18 |
| so usually once they implement optimal parsing, the format is locked | 2019-05-07 08:31:50 |
| and while there're certainly some choices that are always good | 2019-05-07 08:32:29 |
| like literal runs vs literal flags | 2019-05-07 08:32:44 |
| or rep-distances | 2019-05-07 08:32:56 |
| but i don't think anybody tried to optimize a limited set of len or distance values | 2019-05-07 08:33:28 |
| an ever further step would be to optimize a whole state machine, i suppose | 2019-05-07 08:34:45 |
| btw | 2019-05-07 08:35:47 |
| for bytewise LZ | 2019-05-07 08:36:00 |
| if we could put input data into output buffer (probably at the end of buffer) | 2019-05-07 08:36:48 |
| the same code could be used to copy literal runs and matches | 2019-05-07 08:37:15 |
| might be useful? | 2019-05-07 08:37:17 |
<Jibz> | hmm how do you mean? like a circular buffer? | 2019-05-07 08:38:45 |
<Shelwien> | no, fast coders are usually memory-to-memory anyway | 2019-05-07 08:39:51 |
| but usually literals have to be copyied from input buffer to output | 2019-05-07 08:40:12 |
| and match string from output to output | 2019-05-07 08:40:22 |
| so it has to be different code | 2019-05-07 08:40:34 |
| but if we start by copying compressed input to the end of output buffer | 2019-05-07 08:40:57 |
| it becomes kinda the same | 2019-05-07 08:41:02 |
| btw | 2019-05-07 08:41:23 |
| i also used a similar tricky in my deflate decoder | 2019-05-07 08:41:35 |
| it has a 256-byte literal array just before the 32k window buffer | 2019-05-07 08:41:59 |
| and a literal is turned into a len=1;dist=pos+256-lit match | 2019-05-07 08:43:02 |
| and its still faster like that | 2019-05-07 08:43:29 |
| because otherwise there's an unpredictable branch on token type | 2019-05-07 08:43:49 |
<Jibz> | interesting, and it is faster due to avoiding the branch not the literal table being in the same 64k cache? | 2019-05-07 08:45:44 |
<Shelwien> | well, of course literal table not being cached would make it worse | 2019-05-07 08:47:09 |
| but match copy loop is vectorized | 2019-05-07 08:47:42 |
| and its still faster to copy a whole vector than do a if( is_literal ) { *ptr++=lit; } else {...} | 2019-05-07 08:48:46 |
<Jibz> | nice idea :) | 2019-05-07 08:49:27 |
<Shelwien> | another interesting thing that i have there is __assume( len<32 ) | 2019-05-07 08:50:12 |
| modern compilers are rather forceful about converting copy loops to memcpy | 2019-05-07 08:50:42 |
| because they have an optimized/vectorized memcpy intrinsic | 2019-05-07 08:51:03 |
| so when i looked at asm of my match copy loop | 2019-05-07 08:51:21 |
| it actually had 3 versions there | 2019-05-07 08:51:43 |
| the last one being something like if( len>96 ) memcpy( ... ); | 2019-05-07 08:52:11 |
| was very tricky to block that | 2019-05-07 08:52:56 |
| i mean, memcpy has versions for all vector extensions, plus loop preparation code at start | 2019-05-07 08:54:01 |
| so it may be efficient for copying megabytes of data | 2019-05-07 08:54:14 |
| but for LZ matches its certainly bad | 2019-05-07 08:54:22 |
| and compilers force-replace any kind of memory copy loop with it | 2019-05-07 08:54:39 |
<Jibz> | that is interesting, though the use of assume seems dangerous in case len might be more | 2019-05-07 09:01:05 |
<Shelwien> | https://godbolt.org/z/msimIx | 2019-05-07 09:02:05 |
| well, its not _that_ smart atm | 2019-05-07 09:02:21 |
| memcpy is apparently only in IntelC version | 2019-05-07 09:02:50 |
| but gcc still generates too many dumb branches | 2019-05-07 09:03:05 |
<Jibz> | gcc has an assume with probability, might be interesting for this | 2019-05-07 09:05:05 |
<Shelwien> | i tested it, doesn't seem to do anything good atm | 2019-05-07 09:05:32 |
<Jibz> | okay :) | 2019-05-07 09:05:48 |
<Shelwien> | builtin_expect is really important though | 2019-05-07 09:06:20 |
<Jibz> | would have been nice if you could tell it the distribution of short to long matches and get a copy that matches | 2019-05-07 09:06:39 |
<Shelwien> | its very useful for making various bounds-checking branches almost free | 2019-05-07 09:06:56 |
| you can actually do it easily enough | 2019-05-07 09:07:25 |
| via PGO | 2019-05-07 09:07:36 |
<Jibz> | good point | 2019-05-07 09:07:47 |
<Shelwien> | but its just not any useful atm | 2019-05-07 09:08:05 |
| IntelC PGO is much better than gcc's | 2019-05-07 09:08:19 |
| gcc's was never useful in my tests | 2019-05-07 09:08:29 |
| but even intel's doesn't do anything that smart | 2019-05-07 09:08:41 |
| !next | 2019-05-07 16:30:04 |