*** NCDR has joined the channel		2019-05-06 07:58:56
*** NCDR has left the channel		2019-05-06 08:10:29
<FunkyBob>	o/	2019-05-06 12:54:24
<Shelwien>	?	2019-05-06 12:56:53
<FunkyBob>	waving hello	2019-05-06 13:02:16
	just got to day1 of the PyCon sprints	2019-05-06 13:02:33
	what timezone are you in, Shelwien ?	2019-05-06 13:02:41
*** maniscalco_ has left the channel		2019-05-06 13:08:29
<Shelwien>	utc+2	2019-05-06 13:14:42
	just tested lztp with enwik8	2019-05-06 13:15:02
	32k window: 40,674,412	2019-05-06 13:15:25
	64k window: 40,570,224	2019-05-06 13:15:47
	greedy parsing	2019-05-06 13:15:54
<FunkyBob>	lztp?	2019-05-06 13:16:00
	what's the algorithms?	2019-05-06 13:16:27
<Shelwien>	a mostly-bytewise lz scheme from tplink router	2019-05-06 13:16:31
<FunkyBob>	ah	2019-05-06 13:16:35
	that's quite impressive	2019-05-06 13:16:39
<Shelwien>	i reverse-engineered it some time ago	2019-05-06 13:16:47
	its kinda like lzss	2019-05-06 13:16:57
	but with an interesting quirk	2019-05-06 13:17:09
<FunkyBob>	ulz can get about that, in ~1min on my machine	2019-05-06 13:17:16
<Shelwien>	d16 encoding takes 6s on enwik8	2019-05-06 13:17:57
<FunkyBob>	d16?	2019-05-06 13:18:36
<Shelwien>	1<<16 window	2019-05-06 13:18:53
<FunkyBob>	ah	2019-05-06 13:18:57
<Shelwien>	aka 64k	2019-05-06 13:18:58
<FunkyBob>	for reference, how long does gzip -9 take?	2019-05-06 13:19:13
<Shelwien>	5.4s	2019-05-06 13:19:40
	4.1s for lztp with d15	2019-05-06 13:20:43
<FunkyBob>	hrm	2019-05-06 13:21:21
<Shelwien>	anyway, what i wanted to say	2019-05-06 13:21:27
<FunkyBob>	gzip -9 enwi8 for me takes ~29sec	2019-05-06 13:21:34
<Shelwien>	dunno, its not at it best atm	2019-05-06 13:21:50
<FunkyBob>	mmm?	2019-05-06 13:21:50
<Shelwien>	i have 4 threads of optimizer running	2019-05-06 13:22:00
	...is that lzss just uses packed bits for literal/match flags	2019-05-06 13:22:28
	while lztp developer went a step further	2019-05-06 13:22:47
<FunkyBob>	hmm... must have had my laptop unplugged when I ran that.. it's only ~17sec now	2019-05-06 13:23:08
	Shelwien: mmm?	2019-05-06 13:23:16
<Shelwien>	and turned it into an actual interleaved bitstream	2019-05-06 13:23:16
	in lzss its just a flag byte	2019-05-06 13:23:34
<FunkyBob>	I'm toying with the idea of a larger history window, perhaps 20 bits	2019-05-06 13:23:41
	ah, so a byte indicating the next 8 terms?	2019-05-06 13:24:02
<Shelwien>	but in lztp they also put bits of variable-length len/dist there	2019-05-06 13:24:10
<FunkyBob>	I was thinking I could use the first nibble for (match, size of len, 2bit size-of-offset)	2019-05-06 13:24:36
<Shelwien>	and then there's a need to rearrange bytes from bit stream and byte stream	2019-05-06 13:24:45
<FunkyBob>	where offset would be 4bit, 12bit, 20bit, or rep	2019-05-06 13:24:46
<Shelwien>	in order of how decoder would read them	2019-05-06 13:24:52
<FunkyBob>	ah, yes, I recall reading something of this	2019-05-06 13:25:08
<Shelwien>	but otherwise i think this scheme has potential	2019-05-06 13:25:14
<FunkyBob>	clearly	2019-05-06 13:27:12
<Shelwien>	btw, i was talking about this: https://encode.ru/threads/3005-TP-LINK-router-config-compression	2019-05-06 13:30:02
<FunkyBob>	yeah, I recall the thread	2019-05-06 13:33:59
	am building up to trying a more "optimal" parsing... forward scan, reverse walk thing	2019-05-06 13:58:22
	clearly it'll be slower, as it will require a match scan at _every_ byte	2019-05-06 13:58:41
<Shelwien>	yes, but only at encoding, so that's ok	2019-05-06 14:13:31
	the common optimization is like "fb" thing in lzma	2019-05-06 14:13:52
	you can set the threshold match len for greedy encoding	2019-05-06 14:14:07
	ie if len>32, just encode the match	2019-05-06 14:14:17
	its also efficient for splitting the data into separately optimized segments	2019-05-06 14:14:44
	so lzma manages to be streamable despite optimal parsing	2019-05-06 14:15:37
	also, i'd suggest looking into BWT	2019-05-06 14:16:33
	its the most efficient way to find all matches at once :)	2019-05-06 14:16:58
<FunkyBob>	:)	2019-05-06 14:17:22
	have just tried gating lazy parsing on match < 32... no noticeable difference for enwik8... trying 9	2019-05-06 14:21:54
	clearly I need to pick a different cutoff point... but, yah, I see the potential	2019-05-06 14:33:50
<Shelwien>	there's also usually a depth limit on hash chain ("mc" in lzma)	2019-05-06 14:35:17
<FunkyBob>	currently I search back as far as the window allows	2019-05-06 14:43:06
<Shelwien>	yeah, but that can fail on redundant data	2019-05-06 14:44:02
<FunkyBob>	"fail" how?	2019-05-06 14:44:26
<Shelwien>	like, generate the file like "abcd" x 1000000	2019-05-06 14:44:29
	and test the speed	2019-05-06 14:44:32
<FunkyBob>	right,o k	2019-05-06 14:44:44
	degenerate case	2019-05-06 14:44:47
<Shelwien>	unfortunately its surprisingly common	2019-05-06 14:45:04
	just consider filetypes like bmp or wav	2019-05-06 14:45:13
	or some log files	2019-05-06 14:45:26
<FunkyBob>	hrm	2019-05-06 14:46:06
<unic0rn>	or XPADDING in exe	2019-05-06 14:47:12
	altough that's not that long	2019-05-06 14:47:25
<Shelwien>	exes too, yeah	2019-05-06 14:48:16
	they tend to have lots of tables and unrolled code	2019-05-06 14:48:26
<FunkyBob>	ok, so, thinking out aloud... if I scan forward, and at each position store the offset of the best match, but at the _end_ [here + match len] store the len... I can traverse back easily by knowing how far back to look for the match	2019-05-06 15:12:26
	and only update the length if it's longer than the preveious stored one	2019-05-06 15:12:48
<Shelwien>	yes	2019-05-06 15:14:33
	well, in general its slightly different	2019-05-06 15:15:08
<FunkyBob>	then again, I can store the full match details	2019-05-06 15:15:11
	at the offset of then end of the match	2019-05-06 15:15:22
	Shelwien: oh?	2019-05-06 15:15:26
<Shelwien>	1) you have to replace the token if its codelen is smaller, rather than just match len	2019-05-06 15:15:46
<FunkyBob>	yes	2019-05-06 15:16:01
	but for now, all my tokens are the same size	2019-05-06 15:16:09
<Shelwien>	for example, shorter match might have shorter code	2019-05-06 15:16:10
	2) you also have to count literal runs the same way	2019-05-06 15:16:28
<FunkyBob>	shall we say "only update if it's of a greater saving"	2019-05-06 15:16:33
<Shelwien>	yes	2019-05-06 15:16:45
	its basically recursion	2019-05-06 15:17:27
<FunkyBob>	interesting... lz4 works on a max of 4MB blocks	2019-05-06 15:18:07
*** danlock has joined the channel		2019-05-06 16:08:16
<Shelwien>	hi	2019-05-06 16:09:55
<danlock>	Greetings. I'm just looking around; I just came from the encode forums.	2019-05-06 16:12:07
<FunkyBob>	o/	2019-05-06 16:12:50
	I'm playing around with possible lit/mat/offset encodings... find something decent that's not just "LZ4 again"	2019-05-06 16:14:44
<danlock>	:) I appreciate the welcome. I don't have any compression thoughts or questions right now, though. @FunkyBob good luck!	2019-05-06 16:15:10
<Shelwien>	FunkyBob: btw, how about making a lookup table for actual len values?	2019-05-06 16:16:00
	danlock: it doesn't really have to be compression	2019-05-06 16:16:24
	we just had a discussion about AI and quantum mechanics :)	2019-05-06 16:16:42
<FunkyBob>	Shelwien: hrm	2019-05-06 16:16:56
<Shelwien>	like, all len values up to 16 or so, and then something exponent-based	2019-05-06 16:17:40
	and pack the len to 5 bits or so, the use saved bits for flags	2019-05-06 16:18:11
<danlock>	It's good to know I'm in here with like-minded people.	2019-05-06 16:18:49
<FunkyBob>	Shelwien: I wonder how much I'd lose by lowering match resolution as length grows...	2019-05-06 16:20:16
<Shelwien>	based on freqtables that i made - not much	2019-05-06 16:20:51
<FunkyBob>	like you were daying, exponent based.... encode unit + multiplier	2019-05-06 16:21:06
	so 0001 would mean (1 * 1), but 0110 would be (4 * 2)	2019-05-06 16:21:53
<Shelwien>	well, you'd need all short lens, up to 16 or so	2019-05-06 16:22:12
	but then it doesn't really matter whether its 32 or 33	2019-05-06 16:22:28
	especially with added support for rep-matches	2019-05-06 16:22:42
<danlock>	If the len were packed to ~5 bits, would that give you 3 bits for flags or 11? I guess I'm thinking of 8-bit bytes, which would mean your uncompressed len value would comprise two bytes, right?	2019-05-06 16:27:35
<Shelwien>	his current format is like this:	2019-05-06 16:28:05
<FunkyBob>	danlock: yeah, for now I'm trying to stick with byte-aligned	2019-05-06 16:28:08
<Shelwien>	0LLLLLLL = literal run	2019-05-06 16:28:16
	1LLLLLLL DDDDDDDD DDDDDDDD = match for 64k window	2019-05-06 16:28:43
	and a known useful thing is rep-match	2019-05-06 16:29:09
* danlock nods		2019-05-06 16:29:38
	that is, a match with same distance as previous match	2019-05-06 16:29:39
	10LLLLLL = rep-match	2019-05-06 16:30:22
	11LLLLLL DDDDDDDD DDDDDDDD = normal match	2019-05-06 16:30:34
<FunkyBob>	danlock: my current conclusion as to why I lose to lz4 is they imply the match/literal field by _always_ having (literal run, match, offset)	2019-05-06 16:30:43
	danlock: and can encode short runs in 4bits, where I can't	2019-05-06 16:31:02
	Shelwien: will try	2019-05-06 16:31:59
<Shelwien>	another interesting thing	2019-05-06 16:32:29
<danlock>	Putting something into the spec is one way of reducing the number of things for the encoder to keep track of with extra bits	2019-05-06 16:32:31
<Shelwien>	is that although rep-match is a simple straight optimization	2019-05-06 16:32:53
	which is also the reason for the near-useless "rep stack" in many codecs	2019-05-06 16:33:26
	but the main compression improvement from rep-matches actually comes from fuzzy matches	2019-05-06 16:34:01
	for example, book1 file has a <P ##>\n page number markup	2019-05-06 16:34:32
	so lzma encodes it as match:"<P 5" literal:"2" rep-match:">\n"	2019-05-06 16:35:10
	and since you don't yet have large dictionary or complex match-finding structures	2019-05-06 16:35:48
	it means you can continue comparing strings after first mismatch	2019-05-06 16:36:08
	ie _properly_ look for fuzzy matches	2019-05-06 16:36:24
	where most coders just use them accidentally	2019-05-06 16:36:42
	when its the only choice	2019-05-06 16:36:51
<danlock>	passive vs. active	2019-05-06 16:37:53
<Shelwien>	danlock: which color scheme do you like more? http://nishi.dreamhosters.com/chanloglist.pl http://nishi.dreamhosters.com/chanloglist.pl?&cs2	2019-05-06 16:38:09
<danlock>	I like chanloglist.pl better than chanloglist.pl?&cs2, but people with dimmer displays might prefer the latter.	2019-05-06 16:39:50
<Shelwien>	:)	2019-05-06 16:40:23
<FunkyBob>	think I'll do some more debug analysis to see what rep-offset might gain me... as well as match sizes	2019-05-06 16:42:02
	and lit lens	2019-05-06 16:42:05
<Shelwien>	might be better to have optimal parsing first	2019-05-06 16:42:52
<unic0rn>	FunkyBob: just curious, what's your current code size?	2019-05-06 16:43:28
<Shelwien>	13k with mingw	2019-05-06 16:45:36
<unic0rn>	nah, i meant the source code	2019-05-06 16:45:52
<Shelwien>	around 7-8k	2019-05-06 16:46:06
<unic0rn>	i've mentioned 14k recently i think, but i was wrong. saw a status line, didn't realize it's from saving another file. actual encoder-in-progress is below 6k so far	2019-05-06 16:46:32
<FunkyBob>	14464 stripped on x86-64 linux	2019-05-06 16:46:40
<unic0rn>	either i'm slow or turning it into rocket science.	2019-05-06 16:46:49
<FunkyBob>	5728 for the C file	2019-05-06 16:47:22
<unic0rn>	there's no entropy coder?	2019-05-06 16:48:09
<FunkyBob>	no	2019-05-06 16:48:13
<unic0rn>	not yet? :)	2019-05-06 16:48:50
<FunkyBob>	oh, and a 1498 byte "main"	2019-05-06 16:49:05
	no, not yet :)	2019-05-06 16:49:08
	I did find an interesting algorithm for building prefix codes [like Huffman] that makes it fairly cheap to update to "near optiomal" encodings	2019-05-06 16:49:43
<Shelwien>	adaptive bitcoding is totally useless	2019-05-06 16:50:22
	its always slower than adaptive arithmetic coding	2019-05-06 16:50:38
	and compresses worse	2019-05-06 16:50:44
<unic0rn>	you may check deflate's wiki page, there's a short description of its huffman coding	2019-05-06 16:51:15
	i would just go with range though	2019-05-06 16:51:19
<Shelwien>	deflate uses static huffman coding, not adaptive	2019-05-06 16:51:39
<unic0rn>	i know	2019-05-06 16:51:50
<Shelwien>	saves a length table for a block in block header	2019-05-06 16:51:59
	with ~64k blocks usually	2019-05-06 16:52:06
<unic0rn>	well, since adaptive huffman is useless, that's i guess the only option as far as huffman is concerned	2019-05-06 16:52:49
<Shelwien>	well, did you read about fritcode?	2019-05-06 16:53:18
<unic0rn>	now i did. https://encode.ru/threads/1116-Charles-Blooms-new-huffman-optimisation	2019-05-06 16:55:02
<Shelwien>	yes	2019-05-06 16:55:14
	its basically two-stage entropy coding	2019-05-06 16:55:47
	blockwise-static code optimized for data	2019-05-06 16:56:20
<unic0rn>	i was considering fritcode earlier, that is, had that idea just kinda missing the point, i didn't think about adding another stage of entropy coding after that	2019-05-06 16:56:32
<Shelwien>	and fully static code to compress the first code, which has intentionally skewed bit probabilities	2019-05-06 16:57:08
<unic0rn>	but since fritcode is very simple, i was thinking about using one of the codes as an escape code	2019-05-06 16:57:09
	for example, "set fritcode for last decoded symbol to 0"	2019-05-06 16:57:42
<Shelwien>	that's common - eg. deflate has a special code for end-of-block	2019-05-06 16:57:50
<unic0rn>	because rearranging this is far easier than rebuilding whole huffman tree	2019-05-06 16:57:55
	so it could be adaptive	2019-05-06 16:58:32
	and fast at the same time	2019-05-06 16:58:43
<Shelwien>	uh, no	2019-05-06 16:58:44
	optimal fritcode is harder to build than huffman	2019-05-06 16:59:05
	but same block-wise static as huffman	2019-05-06 16:59:35
	can get you better compression	2019-05-06 16:59:42
	since with fritcode, symbol codes are not limited with 1/2^x	2019-05-06 17:00:00
	probabiilities, i mean	2019-05-06 17:00:11
<unic0rn>	well, encoder wouldn't be exactly simple, because it would have to make quite a few estimations to find the most optimal route for switching codes	2019-05-06 17:01:17
	but decoding would be fast	2019-05-06 17:01:22
<Shelwien>	well, you can try	2019-05-06 17:02:04
<unic0rn>	should be clearly faster than adaptive huffman	2019-05-06 17:02:07
	nah. i see no point, unless one's bound to not using range nor arithmetic, or ans	2019-05-06 17:02:28
<FunkyBob>	sorry, how is fritcode different to any other prefix scheme?	2019-05-06 17:02:32
<Shelwien>	but adaptive bitcode wasn't ever used in any practical coder in last 20 years or so	2019-05-06 17:02:51
<FunkyBob>	that post doesn't appear to explain it ... at all	2019-05-06 17:03:17
<Shelwien>	its simply inefficient because of current cpu design	2019-05-06 17:03:18
	FunkyBob: as i said, its a two-stage entropy coding scheme	2019-05-06 17:03:45
	the fritcode itself is like huffman, but based on skewed bit probabilities	2019-05-06 17:04:10
	like 0=1/3, 1=2/3 instead of both 1/2	2019-05-06 17:04:24
	which results in longer codes and better precision of symbol probability approximation	2019-05-06 17:04:49
	and then we just need a 2nd stage code for bytes of 1st code	2019-05-06 17:05:10
<unic0rn>	inefficient... lets talk about inefficient... elapsed time: 00:00:17.203 - current processing of 256k block, far from done, single core	2019-05-06 17:05:24
<Shelwien>	which can be static huffman, or arithmetic, or anything	2019-05-06 17:05:41
	bits with skewed probability are packed to bytes, also with known probability distribution	2019-05-06 17:06:51
<FunkyBob>	unic0rn: using which algorithms?	2019-05-06 17:08:17
<unic0rn>	that information is classified ;)	2019-05-06 17:08:36
	in short: my own	2019-05-06 17:08:41
<FunkyBob>	ok	2019-05-06 17:09:09
	well, here's hoping they get great ratios... for that much time :P	2019-05-06 17:09:21
<unic0rn>	for now, there's hoping it'll work at all	2019-05-06 17:09:42
	gotta finish stage 2 to have something possible to test with decompression, then add stage 3 - entropy coder	2019-05-06 17:10:22
<FunkyBob>	that's a good first goal :)	2019-05-06 17:10:23
<unic0rn>	for now i'm drowning in statistics from stage 1	2019-05-06 17:10:46
<Shelwien>	NNCP basic speed is 100 bytes/s :)	2019-05-06 17:11:17
<unic0rn>	well, creating the slowest encoder around isn't exactly my goal, so i'll let it have that :P	2019-05-06 17:12:00
<Shelwien>	because of that he had to add MT and stream interleaving right away	2019-05-06 17:13:15
	got it 10x faster with that	2019-05-06 17:13:22
	i guess 1kb/s is much better :)	2019-05-06 17:13:35
<unic0rn>	at least i'm faster than that. and MT will help a lot, but that's left for the end	2019-05-06 17:14:21
	well, in the end i'll probably rewrite half of it in assembler	2019-05-06 17:14:47
<danlock>	There are bound to be people who don't mind slow encoders, but if faster encoders with similar compression ratios exist, a slow encoder might not be the way to go. rewriting in ASM will probably improve the speed and decrease the size, but be careful not to introduce new bugs.	2019-05-06 17:19:55
<unic0rn>	i guess i wrote more crazy stuff in asm	2019-05-06 17:22:00
<danlock>	asm is great for reducing size, but it might limit the program to a specific type of processor	2019-05-06 17:23:11
<Shelwien>	asm is very inconvenient for compression research	2019-05-06 17:23:55
	its one thing when you write a codec by format specification	2019-05-06 17:24:21
	but when you have to actually experiment with it - tweak various parameters, precision etc	2019-05-06 17:24:49
	asm just doesn't allow to do it	2019-05-06 17:25:08
<danlock>	If you're doing active research, a high-level language is probably better because it enables you to change specific parts of the encoder for testing more easily. Only after you've found the best possible way to do things should you work on optimization, IMO.	2019-05-06 17:27:42
	Yeah. what Shelwien said. ;)	2019-05-06 17:28:36
<Shelwien>	in compression there's a very little choice of "high-level language" too	2019-05-06 17:28:58
	basically from C and C++	2019-05-06 17:29:08
<unic0rn>	not exactly, danlock. i've got 2 definitions i knew for a fact won't change no matter what. rewrote them in asm, 3x speed increase.	2019-05-06 17:29:33
	kinda helps with testing.	2019-05-06 17:29:38
	and those were basically oneliners, more or less.	2019-05-06 17:29:59
<Shelwien>	yeah, C++ compiler would do it automatically	2019-05-06 17:30:06
<unic0rn>	i have my doubts about that. it would be faster, agreed on that, but those are pretty specific operations	2019-05-06 17:31:01
<danlock>	...and that's why I have so few comments to make on the encode forums, which is why the fact I've been told I don't have permission to reply (the couple of times I've tried to reply) on the encode forums isn't a very big deal.	2019-05-06 17:31:04
<Shelwien>	uh, you probably didn't get the confirmation email	2019-05-06 17:31:32
	what's your login on the forum? same?	2019-05-06 17:31:43
<unic0rn>	of course there are intrinsics, but then that's hardly not optimizing stuff until the very end	2019-05-06 17:31:58
<danlock>	yeah.. same	2019-05-06 17:32:10
<Shelwien>	yeah, "awaiting email confirmation"	2019-05-06 17:32:41
<danlock>	at least it (last time I checked) allows me to mark posts as helpful	2019-05-06 17:32:42
<Shelwien>	switched to normal, you should be able to post now	2019-05-06 17:33:12
	C++ compiler can do function inlining and global optimizations	2019-05-06 17:34:59
	so in many cases, the whole algorithm is just inlined in main()	2019-05-06 17:35:27
<danlock>	Thank you very much. Because you guys (the frequent posters, called "lurkers" by FunkyBob, IIRC, are so much more experienced and active than I am, though, I /very seldom/ have anything to say. Sometimes, though, I feel like I have a valid point.	2019-05-06 17:36:02
<Shelwien>	and then, it can also do expression optimizations when there're constants	2019-05-06 17:36:02
	and there's template syntax	2019-05-06 17:36:27
* danlock thinks.... . o O ( template syntax? )		2019-05-06 17:37:02
	which lets you force the compiler to generate the code for multiple choices of specific constant value	2019-05-06 17:37:06
<danlock>	ahh.	2019-05-06 17:37:22
<Shelwien>	so you can get same performance like with constants	2019-05-06 17:37:28
	but still choose their values in runtime	2019-05-06 17:37:41
<danlock>	Man, it's been too long since I was actively coding	2019-05-06 17:37:42
<Shelwien>	https://github.com/rarten/ooz/blob/master/compr_kraken.cpp#L583	2019-05-06 17:38:48
	as example	2019-05-06 17:38:55
<unic0rn>	yeah, we had that discussion. thing is, in this case i don't think C++ compiler would magically figure out what i'm trying to do and use proper asm instruction on its own. and using intrinsics just throws the "compiler will do that for you" out of the window. in this case, i don't think it would. and sure, whole code in general, compiled by C++ compiler, would be faster. but as it is, it's much	2019-05-06 17:39:09
	easier to maintain for me, because of tons of optimizations built in by design to make it work at reasonable speed at all	2019-05-06 17:39:09
	optimizations that would look much messier in C++, imho	2019-05-06 17:39:46
<danlock>	I took a Uni class in ANSI C and another C class later on (er... no, that was a Calculus class in which the instructor gave me his old copy of a book about coding in C++)	2019-05-06 17:39:52
<Shelwien>	sure, proper manually written code is better than what compilers can do	2019-05-06 17:40:16
<unic0rn>	i'll optimize it further later on, it'll be fast enough regardless of the language	2019-05-06 17:40:20
<Shelwien>	but then, you can usually only support a single target	2019-05-06 17:40:40
	while there's at least 4 now	2019-05-06 17:40:59
<danlock>	:highfive: unic0rn	2019-05-06 17:41:01
<Shelwien>	SSE4/AVX/AVX2/AVX512	2019-05-06 17:41:05
	well, language syntax is important	2019-05-06 17:41:36
<danlock>	AVX512. I think this is the first I've heard of that.	2019-05-06 17:41:39
<Shelwien>	uh, my cpu has it?	2019-05-06 17:41:50
	language syntax strongly affects your choices	2019-05-06 17:42:33
	you won't do stuff that is hard to write in your chosen language	2019-05-06 17:42:55
<danlock>	what cpu do you have?	2019-05-06 17:42:58
<unic0rn>	luckily, forth has almost none	2019-05-06 17:43:02
<Shelwien>	intel 7820X	2019-05-06 17:43:14
	also, even with C++ I sometimes have to write custom preprocessors	2019-05-06 17:44:01
	that is, scripts that generate c++ code from custom syntax	2019-05-06 17:44:40
	here's an example: https://encode.ru/threads/3070-mod_CM-another-paq-submodel?p=59503&viewfull=1#post59503	2019-05-06 17:45:31
<danlock>	Mine's an i5 4670K which runs at 4192MHz under load. CPU-Z v1.88.0.x64 says it supports MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, EM64T, VT-x, AES, AVX, AVX2, FMA3	2019-05-06 17:47:03
<unic0rn>	https://pastebin.com/raw/UMTf92Py	2019-05-06 17:47:33
	as for redefining syntax	2019-05-06 17:47:41
<Shelwien>	sure, but anything actually useful would be much harder	2019-05-06 17:48:27
	like, how would you go about defining a custom float-point type in forth?	2019-05-06 17:49:00
	like, you have a program that worked with lots of 32-bit floats	2019-05-06 17:49:22
	then decided to save memory by reducing mantissa size to 8 bits	2019-05-06 17:49:41
	and saving them as 16-bit values	2019-05-06 17:49:53
<danlock>	FORTH is a language I remember existing LONG ago... but I don't know anything else about it.	2019-05-06 17:49:59
<Shelwien>	its used in bitcoin	2019-05-06 17:50:12
<FunkyBob>	seriously?	2019-05-06 17:50:26
<Shelwien>	yes	2019-05-06 17:50:32
<FunkyBob>	for what?	2019-05-06 17:50:37
<unic0rn>	by rewriting stores and loads. 2 basic operations. assuming calculations can still run in single precision and all you want is storing them as half	2019-05-06 17:50:45
<Shelwien>	custom transaction checking	2019-05-06 17:50:46
<unic0rn>	that's basically rewriting 2 words, preferably in asm i guess	2019-05-06 17:51:02
<Shelwien>	unic0rn: how about working with 32-bit and 16-bit floats in the same code?	2019-05-06 17:51:32
<unic0rn>	then you need separate definitions based on precision, and preferably some conversion words as well	2019-05-06 17:52:16
<Shelwien>	well, in C++ its transparent	2019-05-06 17:52:38
	i can just add a class with operator definitions	2019-05-06 17:53:02
	and change the type of some variables to new class	2019-05-06 17:53:18
<unic0rn>	forth does nothing on its own. which, depending on what you want, may be a good thing	2019-05-06 17:53:35
<Shelwien>	code of actual algorithm would remain exactly the same	2019-05-06 17:53:38
	yes, i know	2019-05-06 17:53:50
	but it lacks too much of that "syntax sugar"	2019-05-06 17:54:10
<unic0rn>	well, in general, same here. you can change some basic blocks upon which the rest of the code is built, and it'll work like nothing happened	2019-05-06 17:54:19
<Shelwien>	it may be possible to implement on your own, but why bother?	2019-05-06 17:54:24
<danlock>	Shelwien: Are you suggesting ideas as a way of stimulating imagination and generating new ideas/thoughts/creativity to potentially find new methods for accomplishing what you'd like to do, or helping us to learn things you're already familiar with, or.. what?	2019-05-06 17:56:02
	Whatever your answer, I like it.	2019-05-06 17:56:14
	My uncle, a chemist (who is mostly focused on spectrography, but also other fascinating things), often asks me for ideas about how to do certain things and it's always been a mind-expanding thing for me.	2019-05-06 17:58:03
<Shelwien>	well, getting feedback is frequently useful	2019-05-06 17:59:05
	i tried just writing down my thoughts about random things	2019-05-06 17:59:41
	but explaining stuff to an actual person works better	2019-05-06 18:00:19
<danlock>	When I was learning about chemistry long ago, I learned (as most beginning chemistry students probably do) about atoms having protons and (usually) neutrons in their nucleus and electrons in their respective "shells" orbiting the nucleus, but it gets a lot more complicated than that. (I'm thinking of your having mentioned that you had been discussing AI and quantum mechanics)	2019-05-06 18:01:16
<unic0rn>	as for missing different targets, in case of my compression it may not matter much. kinda doubt any critical parts will be easily vectorizable. it will benefit from MT, but from SIMD... kinda doubt it	2019-05-06 18:01:33
<Shelwien>	unic0rn: you may be surprised...	2019-05-06 18:02:04
	even basic stuff like memset/memcpy benefits from it	2019-05-06 18:02:17
<danlock>	I usually hope that things I say can "spark" a train of thought in someone that will somehow generate an "aha!" moment ...	2019-05-06 18:02:36
<unic0rn>	except that basic stuff isn't performance critical there	2019-05-06 18:02:39
	what eats up the most time, are loops that are hardly vectorizable	2019-05-06 18:03:14
<Shelwien>	still, new extensions add more registers at least	2019-05-06 18:03:47
	so there're all kind of indirect effects when its done by compiler	2019-05-06 18:04:01
<danlock>	unic0rn: can you say that basic stuff will never, under any circumstances, become performance-critical in the future?	2019-05-06 18:04:04
<Shelwien>	but of course you can't do it manually	2019-05-06 18:04:19
	danlock: its supposedly a solution for the hutter-prize contest	2019-05-06 18:04:50
	there's a time limit too, but its a few hours, so most of things would be ok	2019-05-06 18:05:22
<unic0rn>	danlock: when that basic stuff is called once per 256kB block, it simply doesn't matter	2019-05-06 18:05:41
	because the rest is so much slower	2019-05-06 18:05:59
<Shelwien>	and its true that in theory forth could reduce the decoder size	2019-05-06 18:06:13
<danlock>	I was watching someone code in a game jam a few years ago and I was amazed at how much debugging is taken care of by the top level ... erm..... I can't think of the acronym right now... ...as the coder was writing the source.	2019-05-06 18:06:20
<Shelwien>	which is also counted as part of result in HP	2019-05-06 18:06:24
<unic0rn>	it most likely won't be submitted, assuming it beats the result.	2019-05-06 18:07:14
<Shelwien>	although i kinda doubt that its easy to find a compact forth JIT or interpreter	2019-05-06 18:07:15
<unic0rn>	there are a few, one can always write his own as well	2019-05-06 18:07:57
<Shelwien>	yes, but whats their code size?	2019-05-06 18:08:35
<danlock>	(his desktop was being streamed online... and I was remembering how much less "hand-holding" or "pointing out discrepancies" existed in Turbo Pascal in DOS back before my TBI (traumatic brain injury), which (the TBI) is why I'll often seem scatterbrained	2019-05-06 18:09:09
	)	2019-05-06 18:09:15
<unic0rn>	the one i'm using, 36k	2019-05-06 18:09:27
<danlock>	36k of source code? (I presume so, since that's what you were asking about earlier)	2019-05-06 18:10:12
<Shelwien>	so that's all you need to run HP decoder on a clean machine?	2019-05-06 18:10:25
<unic0rn>	no, 36k forth binary	2019-05-06 18:10:25
<danlock>	oh. (note to self: find an online FORTH reference)	2019-05-06 18:10:59
<unic0rn>	it loads ntdll, kernel32 and kernelbase, so yeah	2019-05-06 18:11:03
<Shelwien>	no bad i guess, but the size of phda9dec is 43636	2019-05-06 18:11:18
	system dlls are ok, but forth itself would be likely counted towards the entry size	2019-05-06 18:11:47
<unic0rn>	as you know, you can throw a ton of forth code on top, and the binary won't grow much	2019-05-06 18:12:06
	of course it would be	2019-05-06 18:12:12
<Shelwien>	yeah, but 43k is the full decoder size atm	2019-05-06 18:12:24
<unic0rn>	this implementation allows bundling of your own code with the base binary. it actually gets there in compiled form	2019-05-06 18:12:31
	because it performs compilation on the fly	2019-05-06 18:12:44
<Shelwien>	so you're handicapped with forth	2019-05-06 18:12:54
<danlock>	Ah. I guess it's formally called merely "Forth" ... my bad	2019-05-06 18:13:02
<unic0rn>	even if, not by much. compression improvement should be much bigger to matter at all	2019-05-06 18:13:48
<Shelwien>	about forth in bitcoin: https://sourceforge.net/p/bitcoin/mailman/message/31397880/	2019-05-06 18:13:51
	uh, that's quite unlikely :)	2019-05-06 18:14:15
<unic0rn>	also, i don't care much about HP i guess. it's mostly a benchmark	2019-05-06 18:14:48
	unlikely... maybe, maybe not	2019-05-06 18:15:11
<Shelwien>	sure, nncp won't win HP either, but might work as advertisement of fabrice's NN library, as expected	2019-05-06 18:15:45
	so getting a good result could be useful for other stuff just as well	2019-05-06 18:16:30
	but a good result would be 16M at least	2019-05-06 18:16:48
	and even that requires a lot of work and multiple components	2019-05-06 18:17:14
	the main problem is actually the memory limit though	2019-05-06 18:18:32
<unic0rn>	well, when i'll finish stage 2, i'll be able to test it without entropy coder, then i'll know 1) if it works or did i screw up something, and 2) what's the estimated ratio, by throwing the result at gzip	2019-05-06 18:18:50
	the memory limit is actually not my problem	2019-05-06 18:19:08
	i've solved that one	2019-05-06 18:19:16
<Shelwien>	better try paq8px for entropy estimation	2019-05-06 18:19:16
	ok :)	2019-05-06 18:20:19
	normally 1gb is barely enough to fit a bt4 match index for enwik8	2019-05-06 18:20:44
<danlock>	The extent to which you should take into account the memory limit is related to the number of testers of your code you want, because they have finite memory limits	2019-05-06 18:20:56
<Shelwien>	and then there's no space left for statistics	2019-05-06 18:20:56
	well, HP contest rules specify a lower memory limit than what is usually available these days	2019-05-06 18:21:36
	(1gb)	2019-05-06 18:21:39
<unic0rn>	take into account that i'm using 256k blocks, but those are configurable. i could perhaps go with up to 4mb or so	2019-05-06 18:21:57
<Shelwien>	without solid compression the results would be even worse :)	2019-05-06 18:22:30
	(ie whole file as one block)	2019-05-06 18:22:50
<danlock>	Steve Gibson at grc.com is the only person I can think of right now who typically codes exclusively in ASM. However, as he's developed his free authentication protocol, SQRL, he's had to use other IDEs for implementation on platforms other than x86.	2019-05-06 18:24:08
<unic0rn>	nah. i'll just put quantum entangled receiver in the decompressor and make it slow enough so i can transmit the whole file by hand	2019-05-06 18:24:15
	oh wait... it doesn't work that way	2019-05-06 18:24:20
<Shelwien>	:)	2019-05-06 18:24:52
<unic0rn>	:P	2019-05-06 18:25:06
<Shelwien>	there's one tricky option that should available	2019-05-06 18:25:19
	its in theory possible to use OS files as reference	2019-05-06 18:25:41
<danlock>	It annoys a lot of people when I say something which seems like a comment out of nowhere and like it will derail the conversation, but it's usually a direct reaction to something I've just read wherever I'm posting.	2019-05-06 18:25:41
<unic0rn>	danlock: i was toying with assembler on zx spectrum first. then i wrote custom video mode + graphics conventer entirely in asm on amiga 600.	2019-05-06 18:26:00
	and then some 256b and similar demoscene stuff on pc	2019-05-06 18:26:09
	all for fun	2019-05-06 18:26:23
	Shelwien: i think that's forbidden	2019-05-06 18:26:55
<Shelwien>	i don't think it is	2019-05-06 18:28:16
<unic0rn>	you cannot use input from external sources, files included	2019-05-06 18:28:24
<danlock>	unic0rn: that's nifty. I love demoscene stuff. What skilled coders can get out of chipsets introduced in the 1970s (like the Atari 8bit computers) and early 80s (Commodore 64/etc.) and PCs in the '90s	2019-05-06 18:28:32
<unic0rn>	system libraries are allowed for calls like i/o, that's it	2019-05-06 18:28:33
	also, Matt will run everything under Linux anyway	2019-05-06 18:28:52
	and complain if it doesn't work, despite being an exe file	2019-05-06 18:29:14
<Shelwien>	maybe, but linux also has plenty of system files, with plenty of text inside	2019-05-06 18:29:16
<danlock>	I forgot to finish my sentence. sigh. What skilled coders can get... ...is really amazing.	2019-05-06 18:29:24
<unic0rn>	yeah. good luck accessing those under wine, if he removed the / mapping	2019-05-06 18:29:42
<danlock>	Mahoney?	2019-05-06 18:29:49
<Shelwien>	yes	2019-05-06 18:29:57
	paq8px already has an option to train the model with its own exe	2019-05-06 18:30:49
	and i don't think that usual system files is explicitly forbidden	2019-05-06 18:31:11
	but its actually far from easy to make use of it	2019-05-06 18:31:25
<danlock>	I haven't seen much of what's going on with him since he stopped actively participating on encode.ru and his domain hasn't been as frequently updated either. It seemed like it coincided with a hurricane that hit the south Florida area a while back.	2019-05-06 18:31:50
<unic0rn>	it's actually nearly impossible. would work only with specific files in specific versions	2019-05-06 18:31:56
	he's running.	2019-05-06 18:32:10
	supposedly a lot.	2019-05-06 18:32:17
<Shelwien>	not necessarily	2019-05-06 18:32:20
<danlock>	isn't that always the case, though? He runs a lot.	2019-05-06 18:32:41
<Shelwien>	for example, you can locate useful text strings in system files	2019-05-06 18:32:50
	and include their hashes in decoder	2019-05-06 18:32:59
<unic0rn>	man pages	2019-05-06 18:33:08
	free dictionary	2019-05-06 18:33:10
<Shelwien>	then find them by hashes on target system	2019-05-06 18:33:14
	yeah	2019-05-06 18:33:15
<danlock>	the northern hemisphere was hit by heavy weather this past couple of seasons, though... not that it necessarily impacted running when in the lower latitudes	2019-05-06 18:34:39
<Shelwien>	danlock: what changed is that he retired from work, and decided to participate in all marathons in US	2019-05-06 18:35:10
	so now has no time for programming	2019-05-06 18:35:20
	https://www.facebook.com/mattmahoneyfl	2019-05-06 18:36:10
<danlock>	ahh. I knew about the retirement, but not about the desire to participate in all marathons. I know of three marathons (and an Iron Man competition, which he might not be interested in) that recently occurred in my region of the western US, and if he wants to run in them all it'll take many years just because many of them happen in different cities and states on the same weekends or same days. He's smart enough to determine an opt	2019-05-06 18:38:17
<Shelwien>	well, it seems to be exactly what he does	2019-05-06 18:39:45
<danlock>	If the desire is to participate in every single marathon, that's a much bigger thing to take on than just a walk/run/bike from coast to coast	2019-05-06 18:39:46
<Shelwien>	also he found a woman there :)	2019-05-06 18:39:50
<danlock>	that'll reduce his availability by a lot even if she's participating in all the marathons with him	2019-05-06 18:40:34
<Shelwien>	well, he somehow still finds time to post on quora	2019-05-06 18:42:15
* danlock laments his own lack of energy and its debilitating effect on his life, which is more sleep than life some days, and more "awake but cognitively-absent" the rest of the time.		2019-05-06 18:42:25
<danlock>	before the TBI, thinking never, ever tired me. after I awoke from the 14-day coma, it did and I learned that my physical energy and cognitive energy are separate, and that the higher-level thinking goes away a lot more quickly than the rudimentary "alert but not so aware" state of living.	2019-05-06 18:45:10
	I'm not looking for sympathy. I'm merely stating my personal experience.	2019-05-06 18:46:07
	prior to the TBI, my memory was very good, and maybe even superb, but I didn't remember everything. Since then, I remember fewer events from each discrete time period, whatever you define that to be, and, as a result, time seems to pass much more quickly for me. shrug	2019-05-06 18:48:07
<Shelwien>	well, I was surprised by Matt's productivity at times	2019-05-06 18:48:28
	like when he started zpaq project	2019-05-06 18:48:37
<danlock>	some people have so much energy	2019-05-06 18:48:39
<Shelwien>	but then, it was based on some pretty weird assumptions	2019-05-06 18:49:18
	so despite investing a lot of work in it, its not really getting more popular	2019-05-06 18:49:42
<danlock>	Sometimes, weird assumptions are where the best results are found at times.	2019-05-06 18:49:47
	doh. sorry for the redundancy... heh.	2019-05-06 18:50:04
<Shelwien>	unlike stuff like lz4, which needs 20 lines of C to decode :)	2019-05-06 18:50:06
<danlock>	brb	2019-05-06 18:50:29
<Shelwien>	well, the main feature in zpaq is forward compatibility	2019-05-06 18:50:40
	the idea was actually introduced in rar	2019-05-06 18:50:54
	and already proved to be useless there	2019-05-06 18:51:14
	but Matt still decided to make his own version	2019-05-06 18:51:26
	so now we have a zpaq config for pi number generation	2019-05-06 18:52:01
	but its not fast, nor has good compression, nor good archive management syntax	2019-05-06 18:52:59
	and even Matt seemingly stopped maintaining it	2019-05-06 18:53:39
<danlock>	How does that work (the pi generator)? I imagine it relies on zpaq's recursive addition from each version to the next of the archive.	2019-05-06 18:53:49
<Shelwien>	zpaq has a simple programming language in its config	2019-05-06 18:54:33
<danlock>	I checked that other C book I have from that calculus class... it's about C, not C++	2019-05-06 18:54:59
<Shelwien>	the idea is that a new compression algorithm can be written in that language	2019-05-06 18:55:03
	and then old zpaq version would be able to decode the archive created by new version with new algorithms	2019-05-06 18:55:37
<danlock>	Have you experimented with that much?	2019-05-06 18:55:40
<Shelwien>	because it would add the config to archive	2019-05-06 18:55:56
	https://encode.ru/threads/1211-Compressing-pi	2019-05-06 18:56:11
<danlock>	I'm glad he put it in the public domain or open sourced it, at least, so it's not going to remain in a backroom at Dell	2019-05-06 18:57:10
<unic0rn>	danlock: https://i.warosu.org/data/g/img/0563/02/1472388528965.png - Shelwien will probably disagree ;)	2019-05-06 18:57:11
<Shelwien>	i think it was his personal project, not related to dell	2019-05-06 18:57:52
<danlock>	I thought that C was more portable than C++ and has become more of a universal fallback as time has passed	2019-05-06 18:58:16
<Shelwien>	since its mostly portable via gcc anyway	2019-05-06 18:58:41
	there'd be C++ when there's C	2019-05-06 18:58:47
	and I agree that C++ standard library is bloated	2019-05-06 18:59:05
	but nobody makes you use it	2019-05-06 18:59:12
	most of C code can be compiled by C++ compiler anyway	2019-05-06 18:59:33
	and C++ has new features which make programming more efficient	2019-05-06 18:59:56
	like not having to write ptr-> before every variable	2019-05-06 19:00:14
	or not having to copy-paste 10 versions of a function tuned to different constant values	2019-05-06 19:00:47
<unic0rn>	you really do love your templates	2019-05-06 19:01:04
<Shelwien>	yes; also coroutines	2019-05-06 19:01:19
<danlock>	Do some of those features require more commenting, or will they remain readable to anyone who understands C++ code?	2019-05-06 19:01:23
	templates can be very useful (remarked the redundant one)	2019-05-06 19:02:05
<Shelwien>	the syntax is usually pretty simple	2019-05-06 19:02:27
	but properly using it requires some experience	2019-05-06 19:02:45
	since there're weird limits in places	2019-05-06 19:03:06
	as advanced example you can see this: https://encode.ru/threads/3077-C-constexpr-experiments	2019-05-06 19:04:09
<danlock>	(re: zpaq's integrated encoder) I guess we need someone with Matt's endurance to take that idea and run with it (no pun intended)	2019-05-06 19:08:06
<Shelwien>	no, its just a failed idea	2019-05-06 19:09:03
	like, its impossible to fit a new-style codec, like zstd, into zpaq	2019-05-06 19:09:44
	because Matt decided that his entropy coding and file format are already perfect	2019-05-06 19:10:40
	it'd be much better to provide a binary plugin interface	2019-05-06 19:11:45
	and maybe asymmetric cryptography for plugin signing	2019-05-06 19:12:13
<danlock>	That's unfortunate. I thought he would be open to improvements/insertions of newer methods into that.	2019-05-06 19:12:22
<Shelwien>	he expected that it would become popular, and people would add support for everything necessary on their own	2019-05-06 19:13:15
	its possible, but	2019-05-06 19:13:39
	(1) too much work because zpaql is basically limited asm	2019-05-06 19:13:56
<danlock>	If he weren't trying to accomplish his running feat, I wouldn't be surprised if he changed zpaq to allow that. I know it's been changed before in ways that made it incompatible with prior versions	2019-05-06 19:14:18
<Shelwien>	(2) people don't like to give away their coders	2019-05-06 19:14:38
<danlock>	I don't know whether those were merely bugfixes, though	2019-05-06 19:14:40
<Shelwien>	(3) progress didn't stop at Matt's entropy coder	2019-05-06 19:15:09
<danlock>	unfortunately, compression time and decompression time are not asymmetric with zpaq	2019-05-06 19:16:02
<Shelwien>	they really are	2019-05-06 19:16:38
	there's BWT and some kind of LZ even in standard version	2019-05-06 19:17:01
<danlock>	oh? I thought it was about the same time for each	2019-05-06 19:17:03
<Shelwien>	BWT is asymmetric, it doesn't need sorting on decode	2019-05-06 19:17:21
	https://github.com/pothos/zpaqlpy	2019-05-06 19:17:37
<danlock>	so not strictly symmetric, but if you use a high-compression mode, won't it take nearly as long to decomp as it did to compress?	2019-05-06 19:19:07
<Shelwien>	well, CM _is_ symmetric	2019-05-06 19:19:41
	and CM _does_ provide the best compression	2019-05-06 19:20:02
	but if you want fast decompression, just don't use it?	2019-05-06 19:20:17
<danlock>	yeah.	2019-05-06 19:20:59
	I've been using rz with the sfx creator provided by ScottX (or somebody at encode) to compress some files more than 7z will, because that includes the encoder with the file. The next version might not be compatible with what was last released by Christian, so that type of compression is needed.	2019-05-06 19:27:00
	...if I want to ensure I can extract what I compressed...	2019-05-06 19:27:55
<Shelwien>	well, Christian never does open source, so you're open of luck here :)	2019-05-06 19:28:57
	you can try lzna/kraken i guess, since it was decompiled not long ago	2019-05-06 19:29:24
	https://encode.ru/threads/3102-Oodle-compressor-open-source?p=59982&viewfull=1#post59982	2019-05-06 19:29:51
<danlock>	I'll probably just do what I've been doing until Christian can find time to release his next binary (he indicated it has better compression for some things), unless something better from Bulat Ziganshin someone else comes along first.	2019-05-06 19:33:24
	*or someone else	2019-05-06 19:33:51
<Shelwien>	sure	2019-05-06 19:34:05
	https://encode.ru/threads/2989-exepack?p=57682&viewfull=1#post57682	2019-05-06 19:35:10
<danlock>	Igor Pavlov increased compression in 18.05 for a lot of my older (and some recent) .7z files (decompressed and recompressed)	2019-05-06 19:35:33
	That's great, and RAR is a good codec for a lot of things, but since updated versions of 7z started compressing better for types of files at which RAR was previously the very best, it's mostly on my PC just for quick path-aware recompressions of ZIP files and stuff.	2019-05-06 19:39:37
	(files I'll usually go back to later and recompress using something which takes longer but will result in a smaller file)	2019-05-06 19:40:30
<Shelwien>	well, rar has its own better sfx, its just a demo	2019-05-06 19:41:09
<danlock>	I wasn't talking specificially about sfx archives, but yeah.. and I think Peazip includes some pre-compressed sfx generators that will make smaller sfx too.	2019-05-06 19:42:54
	Yikes. What I'm typing is starting to sound ... like I'm tired and/or hungry. time for some food. thanks for the conversation. :)	2019-05-06 19:48:39
<Shelwien>	:)	2019-05-06 19:49:04
*** danlock has left the channel		2019-05-06 21:36:39
*** danlock has joined the channel		2019-05-06 21:39:20
<danlock>	info	2019-05-06 21:46:17
<unic0rn>	ERROR: command not found. please upgrade to WINZIP .007	2019-05-06 21:50:15
<danlock>	:)	2019-05-06 21:52:20
*** danlock has left the channel		2019-05-06 23:49:44
*** maniscalco_ has joined the channel		2019-05-07 01:38:43
<maniscalco_>	quiet around here tonight, huh ?	2019-05-07 01:42:12
*** maniscalco_ has left the channel		2019-05-07 02:11:41
<unic0rn>	and left.	2019-05-07 02:26:28
	well, number one rule when you wanna get some coding done: stop looking at IRC	2019-05-07 02:26:43
<FunkyBob>	heh	2019-05-07 02:35:35
<unic0rn>	i'm coding. i'm not looking.	2019-05-07 02:54:42
<FunkyBob>	sure sure :P	2019-05-07 02:57:57
<Shelwien>	https://encode.ru/threads/3110-gcc-9-1-1-released?goto=newpost	2019-05-07 04:47:12
	did a benchmark, somehow gcc82 has the best results?	2019-05-07 05:42:23
	except for PGO which is troublesome to use in gcc	2019-05-07 05:42:42
*** Jibz has joined the channel		2019-05-07 08:08:47
	hi	2019-05-07 08:08:55
<Jibz>	hey :)	2019-05-07 08:09:05
<Shelwien>	what do you think would be the best token layout for LZ4-like coder?	2019-05-07 08:10:08
	64k window, best compression?	2019-05-07 08:10:31
<Jibz>	probably depends on the type of data (text binary), one interesting aspect to me would be if it would be better to make a format that enables efficient optimal parsing (like lz4), or using things like repeat match offsets that improve compression but introduce dependencies	2019-05-07 08:16:21
<Shelwien>	repeat matches are easy enough to handle with optimal parsing	2019-05-07 08:17:04
	lzma does it, for example	2019-05-07 08:17:20
<Jibz>	that would be for values of optimal that are not optimal then, right? :)	2019-05-07 08:17:46
	I should probably say bit-optimal instead	2019-05-07 08:18:22
<Shelwien>	dynamic programming approach should work for any sequential compression method	2019-05-07 08:18:24
	the problem with perfect parsing opt for formats with entropy coding	2019-05-07 08:19:12
	is either it being not sequential (static bitcode with len table in block header)	2019-05-07 08:19:46
	or having to store a copy of all statistics per element of optimizer token array	2019-05-07 08:20:44
	lzma "solves" that by only updating counters once per 128 bytes in optimizer	2019-05-07 08:21:36
<Jibz>	yes, true	2019-05-07 08:22:03
<Shelwien>	but i think it should be still possible to make a perfect version	2019-05-07 08:22:03
	by storing stat diffs per element	2019-05-07 08:22:20
	i already did optimal parsing with entropy coding in cdm	2019-05-07 08:22:46
	so it would be interesting to port it to lzma sometime	2019-05-07 08:23:16
	btw, this is the element struct from lzma:	2019-05-07 08:24:57
	struct COptimal {	2019-05-07 08:24:59
	uint price;	2019-05-07 08:24:59
	uint state;	2019-05-07 08:24:59
	uint prev1IsChar;	2019-05-07 08:24:59
	uint prev2;	2019-05-07 08:24:59
	uint posPrev2;	2019-05-07 08:24:59
	uint backPrev2;	2019-05-07 08:25:00
	uint posPrev;	2019-05-07 08:25:00
	uint backPrev;	2019-05-07 08:25:01
	uint backs[LZMA_NUM_REPS];	2019-05-07 08:25:01
	};	2019-05-07 08:25:02
	so it tracks all 4 rep-distances	2019-05-07 08:25:49
<Jibz>	you could try out all different token layouts on a set of test file and see which gives the best compression on average? I assume people like charles, yann, and ilya already did this, but likely looking for a compromise between speed and ratio	2019-05-07 08:29:52
<Shelwien>	i don't think they really tried	2019-05-07 08:30:57
	i never heard of anybody making an universal parsing optimizer, which would work with any layout	2019-05-07 08:31:18
	so usually once they implement optimal parsing, the format is locked	2019-05-07 08:31:50
	and while there're certainly some choices that are always good	2019-05-07 08:32:29
	like literal runs vs literal flags	2019-05-07 08:32:44
	or rep-distances	2019-05-07 08:32:56
	but i don't think anybody tried to optimize a limited set of len or distance values	2019-05-07 08:33:28
	an ever further step would be to optimize a whole state machine, i suppose	2019-05-07 08:34:45
	btw	2019-05-07 08:35:47
	for bytewise LZ	2019-05-07 08:36:00
	if we could put input data into output buffer (probably at the end of buffer)	2019-05-07 08:36:48
	the same code could be used to copy literal runs and matches	2019-05-07 08:37:15
	might be useful?	2019-05-07 08:37:17
<Jibz>	hmm how do you mean? like a circular buffer?	2019-05-07 08:38:45
<Shelwien>	no, fast coders are usually memory-to-memory anyway	2019-05-07 08:39:51
	but usually literals have to be copyied from input buffer to output	2019-05-07 08:40:12
	and match string from output to output	2019-05-07 08:40:22
	so it has to be different code	2019-05-07 08:40:34
	but if we start by copying compressed input to the end of output buffer	2019-05-07 08:40:57
	it becomes kinda the same	2019-05-07 08:41:02
	btw	2019-05-07 08:41:23
	i also used a similar tricky in my deflate decoder	2019-05-07 08:41:35
	it has a 256-byte literal array just before the 32k window buffer	2019-05-07 08:41:59
	and a literal is turned into a len=1;dist=pos+256-lit match	2019-05-07 08:43:02
	and its still faster like that	2019-05-07 08:43:29
	because otherwise there's an unpredictable branch on token type	2019-05-07 08:43:49
<Jibz>	interesting, and it is faster due to avoiding the branch not the literal table being in the same 64k cache?	2019-05-07 08:45:44
<Shelwien>	well, of course literal table not being cached would make it worse	2019-05-07 08:47:09
	but match copy loop is vectorized	2019-05-07 08:47:42
	and its still faster to copy a whole vector than do a if( is_literal ) { *ptr++=lit; } else {...}	2019-05-07 08:48:46
<Jibz>	nice idea :)	2019-05-07 08:49:27
<Shelwien>	another interesting thing that i have there is __assume( len<32 )	2019-05-07 08:50:12
	modern compilers are rather forceful about converting copy loops to memcpy	2019-05-07 08:50:42
	because they have an optimized/vectorized memcpy intrinsic	2019-05-07 08:51:03
	so when i looked at asm of my match copy loop	2019-05-07 08:51:21
	it actually had 3 versions there	2019-05-07 08:51:43
	the last one being something like if( len>96 ) memcpy( ... );	2019-05-07 08:52:11
	was very tricky to block that	2019-05-07 08:52:56
	i mean, memcpy has versions for all vector extensions, plus loop preparation code at start	2019-05-07 08:54:01
	so it may be efficient for copying megabytes of data	2019-05-07 08:54:14
	but for LZ matches its certainly bad	2019-05-07 08:54:22
	and compilers force-replace any kind of memory copy loop with it	2019-05-07 08:54:39
<Jibz>	that is interesting, though the use of assume seems dangerous in case len might be more	2019-05-07 09:01:05
<Shelwien>	https://godbolt.org/z/msimIx	2019-05-07 09:02:05
	well, its not _that_ smart atm	2019-05-07 09:02:21
	memcpy is apparently only in IntelC version	2019-05-07 09:02:50
	but gcc still generates too many dumb branches	2019-05-07 09:03:05
<Jibz>	gcc has an assume with probability, might be interesting for this	2019-05-07 09:05:05
<Shelwien>	i tested it, doesn't seem to do anything good atm	2019-05-07 09:05:32
<Jibz>	okay :)	2019-05-07 09:05:48
<Shelwien>	builtin_expect is really important though	2019-05-07 09:06:20
<Jibz>	would have been nice if you could tell it the distribution of short to long matches and get a copy that matches	2019-05-07 09:06:39
<Shelwien>	its very useful for making various bounds-checking branches almost free	2019-05-07 09:06:56
	you can actually do it easily enough	2019-05-07 09:07:25
	via PGO	2019-05-07 09:07:36
<Jibz>	good point	2019-05-07 09:07:47
<Shelwien>	but its just not any useful atm	2019-05-07 09:08:05
	IntelC PGO is much better than gcc's	2019-05-07 09:08:19
	gcc's was never useful in my tests	2019-05-07 09:08:29
	but even intel's doesn't do anything that smart	2019-05-07 09:08:41
	!next	2019-05-07 16:30:04