*** complogger has joined the channel		2009-10-05 21:03:23
*** compbooks has joined the channel		2009-10-05 21:04:00
*** toffer has joined the channel		2009-10-05 21:30:42
<toffer>	hi! just wanted to say have a look at your ftp	2009-10-05 21:30:48
<Shelwien>	hi	2009-10-05 21:30:54
	dcc.7z	2009-10-05 21:31:09
	7M only	2009-10-05 21:31:18
	now 10	2009-10-05 21:31:33
<toffer>	it's 78mb	2009-10-05 21:31:38
	unstable connection	2009-10-05 21:31:50
	started with 700kb/s	2009-10-05 21:31:54
	now it's 100	2009-10-05 21:31:57
<Shelwien>	btw, i can enable a shell for you there, if you want ;)	2009-10-05 21:33:38
<toffer>	just too late	2009-10-05 21:35:39
	via ssh?	2009-10-05 21:35:41
<Shelwien>	yeah	2009-10-05 21:35:44
<toffer>	it was a proxy problem all the time	2009-10-05 21:35:56
<Shelwien>	suspected that	2009-10-05 21:36:09
	but i didn't mean for upload	2009-10-05 21:36:13
	there're compilers and stuff	2009-10-05 21:36:23
	so you'd be able to test m1 there or whatever ;)	2009-10-05 21:36:39
	not really a lot of computing resources though	2009-10-05 21:37:00
<toffer>	guess it's not much of use	2009-10-05 21:37:13
	but having a good remote machine would be great	2009-10-05 21:37:22
	gonna eat and have a beer now	2009-10-05 21:37:42
<Shelwien>	%)	2009-10-05 21:37:47
*** pinc has left the channel		2009-10-05 22:19:00
<toffer>	gn8	2009-10-05 23:11:32
*** toffer has left the channel		2009-10-05 23:11:35
*** pinc has joined the channel		2009-10-06 06:31:19
*** Shelwien has left the channel		2009-10-06 08:47:59
*** toffer has joined the channel		2009-10-06 09:46:51
	hi	2009-10-06 10:06:09
	somehow 9 bit precision is enough for stretch ^^	2009-10-06 11:02:44
*** Shelwien has joined the channel		2009-10-06 14:04:25
<Shelwien>	http://toffer.dreamhosters.com/ ;)	2009-10-06 14:05:28
<toffer>	Encoding...done. 22257252/100000000 bytes, 48.00 s that's for 32mb and approx. orders 1,2,4,6	2009-10-06 14:12:30
<Shelwien>	with 4 models?	2009-10-06 14:13:03
<toffer>	yep	2009-10-06 14:13:22
	i can get 0.8% improvement by raising memory	2009-10-06 14:13:31
	but currently it's 4 ordinary models	2009-10-06 14:13:55
	as i said i wanted to replace one of these with a match model	2009-10-06 14:14:05
<Shelwien>	ccmx/bwt are around 20.8M...	2009-10-06 14:15:22
<toffer>	that's not a fair comparision	2009-10-06 14:15:37
	due to the match model	2009-10-06 14:15:40
	for comparision	2009-10-06 14:15:42
	lpaq1 with order 1246, 100000000 -> 22359789 in 114.21 sec. using 51 MB memory	2009-10-06 14:15:55
	as you see the results are together closely	2009-10-06 14:16:10
<Shelwien>	well, i didn't say that its a bad results ;)	2009-10-06 14:16:19
<toffer>	and lpaq1 with m246 gets ccm like compression	2009-10-06 14:16:23
	it's 20.8xxx.xxx ...	2009-10-06 14:16:50
	ccmx like compression	2009-10-06 14:16:55
<Shelwien>	just that imho its necessary to beat at least BWT	2009-10-06 14:17:06
	and accidentally, ccmx has similar results ;)	2009-10-06 14:17:35
<toffer>	as i said replacing a model with a match modell will give the performance of ccmx	2009-10-06 14:17:57
	at higher speeds :)	2009-10-06 14:18:07
	at least ignoring filterrs	2009-10-06 14:18:11
	filters	2009-10-06 14:18:16
<Shelwien>	well, ccm filters don't apply to enwiks	2009-10-06 14:18:27
<toffer>	not exactly	2009-10-06 14:18:34
	it got some text preprocessing	2009-10-06 14:18:43
<Shelwien>	well, plain ccm surely doesn't	2009-10-06 14:19:00
<toffer>	it quantises an order1 context	2009-10-06 14:19:22
<Shelwien>	!grep skymmer.	2009-10-06 14:19:26
<toffer>	based on c>='a' && ...	2009-10-06 14:19:28
<Shelwien>	damn	2009-10-06 14:19:38
	!grep skymmer.narod	2009-10-06 14:20:02
	ccm 5 = 22 003 958	2009-10-06 14:20:29
	ccmx 5 = 21 013 793	2009-10-06 14:20:34
	ccm_sh1d9e 5 = 22 004 883	2009-10-06 14:20:39
<toffer>	5 is how much memory?	2009-10-06 14:20:49
	as i said it's just 32mb for me	2009-10-06 14:21:01
<Shelwien>	550M	2009-10-06 14:21:05
<toffer>	^^	2009-10-06 14:21:08
	Allocated 262535 kB.	2009-10-06 14:21:32
	Encoding...done. 21451718/100000000 bytes, 47.49 s	2009-10-06 14:21:34
<Shelwien>	i just wanted to show that plain ccm doesn't have a text filter	2009-10-06 14:21:56
<toffer>	it has	2009-10-06 14:22:09
	look at the source	2009-10-06 14:22:12
	it's not a filter	2009-10-06 14:22:31
<Shelwien>	ah, that...	2009-10-06 14:22:37
<toffer>	it just uses some text specific contexts	2009-10-06 14:22:38
<Shelwien>	well, you would use some too	2009-10-06 14:22:50
	if you optimized the masks etc ;)	2009-10-06 14:22:58
	and 21.4 is certainly more impressive	2009-10-06 14:23:31
<toffer>	it's just the increase in memory	2009-10-06 14:23:59
	i'm testing 2gb now	2009-10-06 14:24:06
	not much improvement	2009-10-06 14:24:14
	Allocated 2097543 kB.	2009-10-06 14:24:26
	Encoding...done. 21371640/100000000 bytes, 52.57 s	2009-10-06 14:24:28
	i guess prior to the match model varaint	2009-10-06 14:24:36
	i'd release a plain 4 model variant	2009-10-06 14:24:45
<Shelwien>	yeah, its ok	2009-10-06 14:25:07
<toffer>	the little effect of memory increase from 256mb to 2gb is due to nibble caching	2009-10-06 14:25:21
<Shelwien>	though how do you scale memory use for different models?	2009-10-06 14:25:25
<toffer>	there're two hash tables	2009-10-06 14:25:34
	w8	2009-10-06 14:25:36
	two hash tables for high and low nibbles	2009-10-06 14:34:45
	a larger collision domain helps	2009-10-06 14:34:59
	and i need to separate these due to nibble caching	2009-10-06 14:35:10
	(which successfully removes 26% of cache misses)	2009-10-06 14:35:27
<Shelwien>	well, what i meant	2009-10-06 14:35:42
	is that it might be better to specify the hashtable size separately	2009-10-06 14:35:57
<toffer>	?	2009-10-06 14:36:31
<Shelwien>	if two 16M hashtables work ok	2009-10-06 14:36:45
	that doesn't mean that 32M+32M would be better than 32M+16M	2009-10-06 14:37:00
	well, relative scales	2009-10-06 14:37:15
	also, it would be probably good to optimize the parameters separately for different memory settings	2009-10-06 14:40:19
	afair, m1 is now able to load some parameter profiles in runtime?	2009-10-06 14:40:39
<toffer>	es	2009-10-06 14:56:09
	yes	2009-10-06 14:56:13
	as to has table division	2009-10-06 15:01:47
	i already made some experiment	2009-10-06 15:01:53
	increasing the high nibble's collision domain size helps, since it stores the nibble cache. it boosts both, compression and speed	2009-10-06 15:02:19
* Shelwien tried to use precomp to find the pdf titles, but failed		2009-10-06 15:44:29
*** pinc has left the channel		2009-10-06 15:47:47
	afair the file names should be in lexical order (compared to the index pdf) when you look at	2009-10-06 15:47:48
<Shelwien>	you didn't include the indexes ;)	2009-10-06 15:49:06
<toffer>	i did	2009-10-06 15:49:20
	you need to look at the first few pdfs	2009-10-06 15:49:34
	one of these is a toc sheet	2009-10-06 15:49:47
<Shelwien>	%)	2009-10-06 15:49:54
	found it, but i think getting the indexes off ieee would be more convenient	2009-10-06 15:52:12
<toffer>	there's no ieee index list or something like that	2009-10-06 15:53:07
<Shelwien>	there is	2009-10-06 15:53:14
<toffer>	since all articles just name to ieeexplore.pdf	2009-10-06 15:53:15
<Shelwien>	http://www.ieeexplore.ieee.org/xpl/tocresult.jsp?isnumber=30443&isYear=2005	2009-10-06 15:53:18
<toffer>	well, yes	2009-10-06 15:53:40
	i thought you meant something like a file to download	2009-10-06 15:53:52
	but i found nothing like that	2009-10-06 15:54:01
<Shelwien>	well, i'd just make an index out of this, i guess	2009-10-06 15:54:07
<toffer>	yeah the arnumbers map to pdf names	2009-10-06 15:54:18
	btw with more models i was able to lower counter precision w/o any great compression loss	2009-10-06 15:59:16
<Shelwien>	you mean output probability?	2009-10-06 15:59:54
<toffer>	counters	2009-10-06 16:00:04
<Shelwien>	i thought that you only use bytewise fsm?	2009-10-06 16:00:08
<toffer>	you mean 256 states	2009-10-06 16:00:24
	?	2009-10-06 16:00:52
<Shelwien>	yes	2009-10-06 16:01:11
<toffer>	yeah i do	2009-10-06 16:01:24
	i meant the counters within the sse maps	2009-10-06 16:01:30
<Shelwien>	ah	2009-10-06 16:02:57
<toffer>	in 0.4 i had 20 bit counters with notable compression improvement	2009-10-06 16:03:15
	but due to stretch/squash mappings (which are 9/12 bits) 16 bit is sufficient	2009-10-06 16:03:46
<Shelwien>	yeah... also counters are quantized anyway	2009-10-06 16:04:28
<toffer>	yeah	2009-10-06 16:13:13
	that's why i guess	2009-10-06 16:13:16
	but i found a mapping of 9->15 bits to be optimal	2009-10-06 16:13:29
	(stretch)	2009-10-06 16:13:33
	and again i removed alot of parameters	2009-10-06 16:13:47
	(which resulted of an extension to 4 models)	2009-10-06 16:14:07
<Shelwien>	well, for me, optimization is a cheap resource, i guess	2009-10-06 16:15:12
	for example, i didn't run any in more than a month ;)	2009-10-06 16:15:51
	and i don't ever switch off the q9450 ;)	2009-10-06 16:16:11
<toffer>	if i knew that i'd ask you to optimize	2009-10-06 16:18:10
	some m1 stuff ^^	2009-10-06 16:18:14
	a ssh login on a 64 bit linux would be nice	2009-10-06 16:18:41
<Shelwien>	only 32-bit, sorry ;)	2009-10-06 16:19:18
<toffer>	it'd still work	2009-10-06 16:19:38
<Shelwien>	you can try logging to that dreamhost server now	2009-10-06 16:24:58
	same login/pass, same server, but ssh	2009-10-06 16:25:10
<toffer>	thanks	2009-10-06 16:25:51
	but i'd not be useful for optimization i guess	2009-10-06 16:26:06
<Shelwien>	yeah	2009-10-06 16:26:12
<toffer>	but i can use i for file uploads	2009-10-06 16:26:13
<Shelwien>	yeah, as i said, http://toffer.dreamhosters.com now works	2009-10-06 16:26:40
	and it supports scripts and stuff there btw	2009-10-06 16:26:59
*** pinc has joined the channel		2009-10-06 16:27:06
<toffer>	i thought the name is already used	2009-10-06 16:27:07
<Shelwien>	as a username	2009-10-06 16:27:23
	xx.dreamhosters.com are not automatic	2009-10-06 16:27:39
<toffer>	thanks for the login	2009-10-06 16:30:01
<Shelwien>	;)	2009-10-06 16:30:14
<toffer>	i increased the nibble cache w/o changing memory requirements drastically. now it can directly be compared to lpaq	2009-10-06 16:38:09
	lpaq 1246 100000000 -> 22359789 in 112.95 sec. using 51 MB memory.	2009-10-06 16:38:21
	Allocated 49543 kB.	2009-10-06 16:38:43
	Encoding...done. 21956964/100000000 bytes, 46.87 s	2009-10-06 16:38:44
	0.4% better compression while beeing 2.4 times as fast	2009-10-06 16:39:52
	and better than a comparable ccm without a match model	2009-10-06 16:40:15
	i mean without having a match model	2009-10-06 16:40:25
<Shelwien>	what about book1 vs ppmd? ;)	2009-10-06 16:43:12
<toffer>	didn't try	2009-10-06 16:45:13
	if you can give me some numbers i can compare it	2009-10-06 16:45:20
	but still a match model provies much better performance	2009-10-06 16:45:43
<Shelwien>	well, ppmd doesn't have a match model either ;)	2009-10-06 16:46:11
	as to numbers...	2009-10-06 16:46:50
<toffer>	well but it can increase its coding order which would have a similar effect	2009-10-06 16:46:50
<Shelwien>	http://compression.ru/ds/ppmdj.rar	2009-10-06 16:46:58
	i think you should be able to compile it even if you're on linux	2009-10-06 16:47:19
	as to similar effect - not really	2009-10-06 16:47:38
	because then it'd have to flush the tree more frequently	2009-10-06 16:47:57
	which would be slower or/and hurt compression	2009-10-06 16:48:14
	but with a small file, like book1	2009-10-06 16:48:38
	i think ppmd might be a good competition	2009-10-06 16:48:54
<toffer>	well of course it has a similar effect	2009-10-06 16:50:20
	higher orders indicate a greater prediction confidence	2009-10-06 16:51:00
<Shelwien>	it _can_ have, but doesn't in practice	2009-10-06 16:51:05
	ppmd would benefit from using a match model the same as m1	2009-10-06 16:51:26
<toffer>	having fixed models is a great difference	2009-10-06 16:51:50
<Shelwien>	dunno	2009-10-06 16:52:07
	what's interesting here is that ppmd is bytewise	2009-10-06 16:52:18
	and it still would be slower that m1x2 maybe	2009-10-06 16:52:35
	but it should have better compression	2009-10-06 16:52:51
<toffer>	it doesn' work under linux	2009-10-06 16:54:31
	doesn't	2009-10-06 16:54:34
	it includes windows.h	2009-10-06 16:54:38
<Shelwien>	did you use a makefile?	2009-10-06 16:55:50
<toffer>	yeah	2009-10-06 16:57:41
<Shelwien>	which one? ;)	2009-10-06 16:57:52
	i'd suggest makefile.gmk for gcc ;)	2009-10-06 16:58:07
<toffer>	i needed to modify the source	2009-10-06 17:00:51
	some defines	2009-10-06 17:00:52
	could you suggest any options?	2009-10-06 17:01:09
<Shelwien>	default?	2009-10-06 17:01:52
	well, also -o12 -m50 -r1 maybe	2009-10-06 17:02:24
	btw, i made these indexes for dcc, do you need them?	2009-10-06 17:05:05
<toffer>	you can upload these	2009-10-06 17:05:58
<Shelwien>	done	2009-10-06 17:06:47
<toffer>	Fast PPMII compressor for textual data, variant J, Oct 6 2009	2009-10-06 17:10:24
	book1: 768771 > 209823, 2.18 bpb, used: 5.4MB, speed: 5024 KB/sec	2009-10-06 17:10:25
	Allocated 49543 kB.	2009-10-06 17:10:45
	Encoding...done. 213728/768771 bytes, 0.36 s	2009-10-06 17:10:46
	model initialisation takes quite a while	2009-10-06 17:10:54
	i tuned ppmds order	2009-10-06 17:11:02
	6 seems to be optimal	2009-10-06 17:11:13
<Shelwien>	for book1 maybe	2009-10-06 17:11:38
	btw	2009-10-06 17:12:37
	can you also download other dcc years? ;)	2009-10-06 17:12:51
<toffer>	i thought you got these?	2009-10-06 17:13:03
<Shelwien>	i'd build the similar indexes then	2009-10-06 17:13:06
	and yeah, i have them	2009-10-06 17:13:15
	but they're messy	2009-10-06 17:13:24
	and i wonder about pdf quality	2009-10-06 17:13:31
	anyway, they're not from ieee xplore	2009-10-06 17:14:18
	so i'd prefer have it all in a consistent form ;)	2009-10-06 17:14:41
<toffer>	i'll do that	2009-10-06 17:15:02
	but not today	2009-10-06 17:15:04
<Shelwien>	sure	2009-10-06 17:15:08
	btw i just found that they've got a prize for an article ;)	2009-10-06 17:16:29
	i'm not a student unfortunately ;)	2009-10-06 17:16:37
<toffer>	retuned to book1 i get 2k improvement	2009-10-06 17:18:37
	leaving contexts untouched	2009-10-06 17:18:52
<Shelwien>	so 211?	2009-10-06 17:19:00
<toffer>	190 f* = 211732.000000	2009-10-06 17:19:01
<Shelwien>	similar to bwt with 4 models, i guess...	2009-10-06 17:19:28
	but hopefully faster	2009-10-06 17:19:38
<toffer>	a match model will do better as i said	2009-10-06 17:19:50
<Shelwien>	not for book1	2009-10-06 17:19:58
<toffer>	and complete book1 tuning too	2009-10-06 17:19:58
	sure?	2009-10-06 17:20:06
<Shelwien>	there're not much matches	2009-10-06 17:20:22
<toffer>	did you ever couple a match model as i lpaq	2009-10-06 17:20:23
<Shelwien>	order6 for ppmd is a proof of that	2009-10-06 17:20:28
<toffer>	it's not about matches, it's about the correct determinition of context order	2009-10-06 17:20:38
<Shelwien>	we can test my new LZ i guess...	2009-10-06 17:20:50
<toffer>	you mean that fma?	2009-10-06 17:21:07
<Shelwien>	yeah	2009-10-06 17:21:10
	i'd rewritten it yesterday	2009-10-06 17:21:19
	and now it actually transforms and detransforms stuff	2009-10-06 17:21:41
	only -9M off enwik9 though	2009-10-06 17:22:02
<toffer>	so can you use it like rep now?	2009-10-06 17:22:02
<Shelwien>	kinda	2009-10-06 17:22:10
	it outputs two files though	2009-10-06 17:22:18
	wonder if gcc compiles it...	2009-10-06 17:23:52
<toffer>	not if you casted pointer to ints as you usually do	2009-10-06 17:24:10
	^^	2009-10-06 17:24:12
<Shelwien>	no, i've been more careful this time... in a way	2009-10-06 17:24:35
	it compiles apparently	2009-10-06 17:24:39
	got that?	2009-10-06 17:27:01
	usage is	2009-10-06 17:27:40
	fma-hash source literal_file structure_file	2009-10-06 17:28:04
	fma-dec literal_file structure_file output	2009-10-06 17:28:13
<toffer>	i didn't get anythign	2009-10-06 17:29:08
	did you upload it?	2009-10-06 17:29:19
	i need the sources	2009-10-06 17:29:29
<Shelwien>	err... PM'ed	2009-10-06 17:29:30
<toffer>	pm?	2009-10-06 17:29:42
<Shelwien>	http://shelwien.googlepages.com/fma_06.rar	2009-10-06 17:30:15
<toffer>	Allocated 49543 kB.	2009-10-06 17:33:08
	Encoding...done. 21956964/100000000 bytes, 46.87 s -> 2133 kb/s	2009-10-06 17:33:10
	cm@e051:~/shared/testset/enwik$ ../../projects/temp/PPMd e -m50 -o10 -r1 enwik8 -f/dev/null	2009-10-06 17:33:11
	Fast PPMII compressor for textual data, variant J, Oct 6 2009	2009-10-06 17:33:13
	enwik8:100000000 >22632473, 1.38 bpb, used: 39.0MB, speed: 1593 KB/sec	2009-10-06 17:33:14
	cm@e051:~/shared/testset/enwik$ ../../projects/temp/PPMd e -m50 -o12 -r1 enwik8 -f/dev/null	2009-10-06 17:33:16
	Fast PPMII compressor for textual data, variant J, Oct 6 2009	2009-10-06 17:33:17
	enwik8.pmd already exists, overwrite?: <Y>es, <N>o, <A>ll, <Q>uit?y	2009-10-06 17:33:19
	enwik8:100000000 >22719967, 1.39 bpb, used: 49.7MB, speed: 1413 KB/sec	2009-10-06 17:33:20
<Shelwien>	what?	2009-10-06 17:33:33
<toffer>	it's faster than ppmd	2009-10-06 17:33:53
	with better compression	2009-10-06 17:34:02
	for enwik	2009-10-06 17:34:09
	i'm retrying order 8	2009-10-06 17:34:19
	dunnot know what'd be bester for e8	2009-10-06 17:34:26
<Shelwien>	well, for enwik there're memory troubles	2009-10-06 17:34:43
	the thing which Shkarin does to free some memory is just too weird to do any good	2009-10-06 17:35:06
<toffer>	dunnot know	2009-10-06 17:35:58
	i got collision	2009-10-06 17:36:01
	collisions	2009-10-06 17:36:06
	but testing with limited memory should be fair	2009-10-06 17:36:25
<Shelwien>	its still better than flushing all the stats after each few MBs	2009-10-06 17:37:00
<toffer>	the best result i got for ppmd is order8	2009-10-06 17:37:03
	enwik8:100000000 >22524820, 1.37 bpb, used: 35.8MB, speed: 2106 KB/sec	2009-10-06 17:37:05
	it runs at about the same speed here	2009-10-06 17:37:13
	i used r1	2009-10-06 17:37:23
	which doesn't flush the model	2009-10-06 17:37:33
<Shelwien>	it does	2009-10-06 17:37:38
	unfortunately	2009-10-06 17:37:42
<toffer>	there's not any better option built in	2009-10-06 17:37:44
<Shelwien>	well, sure	2009-10-06 17:37:49
<toffer>	-rN - set method of model restoration at memory insufficiency:	2009-10-06 17:38:04
	-r0 - restart model from scratch (default)	2009-10-06 17:38:05
	-r1 - cut off model (slow)	2009-10-06 17:38:07
<Shelwien>	i know ;)	2009-10-06 17:40:14
<toffer>	but that's not freeing the model	2009-10-06 17:40:45
	it rebuilds it	2009-10-06 17:40:49
<Shelwien>	it cuts it down until 75% memory is free	2009-10-06 17:41:07
	and that's slow and hurts compression	2009-10-06 17:41:19
<toffer>	still the memory constrain 50 mb	2009-10-06 17:41:32
	is	2009-10-06 17:41:39
<Shelwien>	well, memory issues are more complex to handle in ppmd	2009-10-06 17:42:29
	but it doesn't mean that its fair to compare it knowing that	2009-10-06 17:42:42
<toffer>	dunnot know	2009-10-06 17:43:04
	i can increase the memory	2009-10-06 17:43:08
<Shelwien>	yeah, the overflow handling has to be improved	2009-10-06 17:43:12
	but atm we know that ppmd is bad at that	2009-10-06 17:43:22
<toffer>	on the other hand i could say that it's unfair to compare it without having a match model	2009-10-06 17:43:29
<Shelwien>	so whats the sense to hang on that?	2009-10-06 17:43:35
<toffer>	nothing	2009-10-06 17:43:40
	but it's a good ppm implementation	2009-10-06 17:43:51
<Shelwien>	as i said... ppmd practically doesn't have a match model	2009-10-06 17:43:52
<toffer>	it's not about coding matches separately	2009-10-06 17:44:05
	it's just about knowing the coding order at each step	2009-10-06 17:44:12
	which has a significant influence for fixed models	2009-10-06 17:44:25
<Shelwien>	no, i mean that PPM in theory includes the features of a match model	2009-10-06 17:44:47
	but in practice ppmd overflow handling cuts that down	2009-10-06 17:45:06
<toffer>	still it can access higher order statisticcs	2009-10-06 17:45:25
	but the overflow handling isn't the best. i agree here	2009-10-06 17:45:45
<Shelwien>	well, you can set it to -o8 or whatever you want	2009-10-06 17:45:45
	anyway, you can compare it with 2G of memory ;)	2009-10-06 17:46:07
<toffer>	the best configuration i found was order 8	2009-10-06 17:46:12
	that's what i call unfair ^^	2009-10-06 17:46:26
	but a good thing would be to test on a file where no overflows happen	2009-10-06 17:46:42
	and than i'd limit it to order 6	2009-10-06 17:47:40
<Shelwien>	that's kinda what i meant	2009-10-06 17:48:12
<toffer>	i tried to compile fma	2009-10-06 17:49:49
	without success	2009-10-06 17:49:52
	but i'll stop for today	2009-10-06 17:49:59
	since i need to do something for my thesis, too	2009-10-06 17:50:06
	-.-	2009-10-06 17:50:08
	just wasted too much time	2009-10-06 17:50:13
	cm@e051:/mnt/shared_extern/projects/temp/fma/fma_06$ gcc -O3 -o fma-hash fma-hash.cpp -lstdc++	2009-10-06 17:50:26
	In file included from fma-hash.cpp:43:	2009-10-06 17:50:28
	hashbuf.inc:29: error: expected constructor, destructor, or type conversion before ‘(’ token	2009-10-06 17:50:29
	hashbuf.inc: In function ‘void HashBufInit()’:	2009-10-06 17:50:31
	hashbuf.inc:36: error: ‘hridx’ was not declared in this scope	2009-10-06 17:50:32
	hashbuf.inc: In function ‘uint HashFind()’:	2009-10-06 17:50:34
	hashbuf.inc:47: error: ‘hridx’ was not declared in this scope	2009-10-06 17:50:35
	hashbuf.inc: In function ‘void HashIndex()’:	2009-10-06 17:50:37
<Shelwien>	well, i compiled it on DH	2009-10-06 17:50:38
<toffer>	hashbuf.inc:68: error: ‘hridx’ was not declared in this scope	2009-10-06 17:50:38
	cm@e051:/mnt/shared_extern/projects/temp/fma/fma_06$	2009-10-06 17:50:40
<Shelwien>	there's a __declspec accidentally	2009-10-06 17:50:59
	but now it segfaults on any file	2009-10-06 17:51:08
	well, i'd check what happens later, food calls ;)	2009-10-06 17:51:38
<toffer>	enjoy	2009-10-06 17:54:47
<Shelwien>	back...	2009-10-06 18:12:46
	damn, why gcc always has to be so annoying...	2009-10-06 18:27:59
	apparently, it doesn't allow negative array indexes	2009-10-06 18:28:29
	now, why?	2009-10-06 18:28:31
	ok, now it works	2009-10-06 18:30:18
	toffer?	2009-10-06 18:30:47
<toffer>	sorry	2009-10-06 18:53:50
	as i said i gonna work on my thesis now	2009-10-06 18:53:57
	just wasted too much time today	2009-10-06 18:54:03
<Shelwien>	...can't process enwik8 on dreamhost %)	2009-10-06 19:00:30
*** pinc has left the channel		2009-10-06 20:48:58
	enwik8:200 enwik8:64 enwik8:32 Ecoli:200 Ecoli:64 Ecoli:32	2009-10-06 20:56:19
	literal 99782568 97838655 95601834 4596028 4581466 4572494	2009-10-06 20:56:19
	structure 4828 176308 728008 592 1708 3028	2009-10-06 20:56:19
	ppmd-o8m50 22508525 22459174 22480285 1172440 1169545 1168017	2009-10-06 20:56:19
	...	2009-10-06 20:56:19
	E_coli enwik8	2009-10-06 20:56:20
	source 4638690 100000000	2009-10-06 20:56:22
	ppmd-o8m50 1182206 22524842	2009-10-06 20:56:24
	fma preprocessing results ;)	2009-10-06 20:56:49
	toffer?	2009-10-06 20:56:52
<toffer>	yeah	2009-10-06 21:01:43
	looks like it hurts on enwik	2009-10-06 21:03:02
	probably it breaks the contexts	2009-10-06 21:03:13
	?	2009-10-06 21:03:14
<Shelwien>	that's because ppmd can't compress the structure file	2009-10-06 21:03:16
<toffer>	what does it contain?	2009-10-06 21:03:40
<Shelwien>	literal lengths and match offsets/lens	2009-10-06 21:03:57
	12-byte records	2009-10-06 21:04:11
<asmodean>	hm	2009-10-07 00:06:50
<Shelwien>	m?	2009-10-07 00:07:03
<asmodean>	6gb of pngs + masks i want to merge into 32-bit bitmaps and then LZMA	2009-10-07 00:07:10
	wonder if i have enough temporary space for this ;p	2009-10-07 00:07:18
	each file is 150mb heh	2009-10-07 00:07:31
<Shelwien>	compress them first? ;)	2009-10-07 00:07:38
	with pngcrush at least?	2009-10-07 00:07:45
<asmodean>	i wish 7zip/winrar were smart enough to contextually decompress known formats	2009-10-07 00:08:16
	like recognize zlib streams and decompress them before compressing ;p	2009-10-07 00:08:35
<Shelwien>	well, precomp should support pngs	2009-10-07 00:08:47
<asmodean>	precomp?	2009-10-07 00:09:00
<Shelwien>	http://schnaader.info	2009-10-07 00:09:11
<asmodean>	ah, well that's the idea but it still needs temp space	2009-10-07 00:10:02
	that's what i am doing manually right now convertin all the pngs to bitmaps ;p	2009-10-07 00:10:13
	converting	2009-10-07 00:10:16
	haha but look prepaq!	2009-10-07 00:10:41
	those paq algorithms take like 5000,00000 years to run	2009-10-07 00:10:57
<Shelwien>	there's lpaq i think	2009-10-07 00:11:07
	its considerably faster (though not as good)	2009-10-07 00:11:37
<asmodean>	yeah but lpaq is only on one file	2009-10-07 00:11:45
	i need to gain efficiency form repetition between these files	2009-10-07 00:11:53
	from	2009-10-07 00:11:56
<Shelwien>	huh.	2009-10-07 00:12:16
	i that's that's what my new tool is for ;)	2009-10-07 00:12:23
	you can try this though: http://haskell.org/bz/rep.zip	2009-10-07 00:12:52
<asmodean>	i'll just let it gobble up 200gb and then lzma it	2009-10-07 00:13:07
<Shelwien>	not the best idea if there're many similarities	2009-10-07 00:13:32
	try that rep first	2009-10-07 00:13:40
	(+lzma)	2009-10-07 00:13:44
<toffer>	gn8 guys	2009-10-07 00:13:54
<asmodean>	what's it do?	2009-10-07 00:13:57
*** toffer has left the channel		2009-10-07 00:13:59
<Shelwien>	rep? finds long repetitions at large distances	2009-10-07 00:14:19
	and removes these ;)	2009-10-07 00:14:38
<asmodean>	huh	2009-10-07 00:14:52
<Shelwien>	a preprocessor too, like precomp	2009-10-07 00:14:55
<asmodean>	let's see what it does to these images containing animations	2009-10-07 00:15:00
	(very slight differences)	2009-10-07 00:15:16
	haha	2009-10-07 00:16:53
	Detailed line noise in Russian *****************	2009-10-07 00:17:05
<Shelwien>	?	2009-10-07 00:17:18
<asmodean>	the russian characters don't show well in my locale	2009-10-07 00:17:43
<Shelwien>	its not necessary for it to work ;)	2009-10-07 00:18:19
<asmodean>	it did a lot worse than png ;p	2009-10-07 00:18:43
	~38mb vs 22mb	2009-10-07 00:18:51
<Shelwien>	that's ok probably	2009-10-07 00:19:01
	as its a preprocessor	2009-10-07 00:19:09
	and also there're options	2009-10-07 00:19:17
<asmodean>	oh i thought it preprocessed and then compressed	2009-10-07 00:19:18
<Shelwien>	its output would be smaller probably	2009-10-07 00:19:41
<asmodean>	so if i were to rar this and a png we'd have a good test	2009-10-07 00:19:51
<Shelwien>	if you'd run it like rep -l32	2009-10-07 00:20:01
	its default minmatchlen is 512 bytes	2009-10-07 00:20:22
<asmodean>	for this data i suspect longer is better	2009-10-07 00:20:40
	nope	2009-10-07 00:20:47
	-l32 got it down to 17mb	2009-10-07 00:20:53
<Shelwien>	sure	2009-10-07 00:21:00
	and that should be still compressible	2009-10-07 00:21:10
<asmodean>	yeah the 17mb rep output compressed down to 7mb with winrar	2009-10-07 00:23:29
	png 22mb -> 21mb	2009-10-07 00:23:34
	png needs some options other than deflate :P	2009-10-07 00:24:00
<Shelwien>	it kinda has	2009-10-07 00:24:20
	as i said, try pngcrush for it	2009-10-07 00:24:28
<asmodean>	i have	2009-10-07 00:24:33
<Shelwien>	and then pngout+deflopt if that's not enough	2009-10-07 00:24:37
<asmodean>	pngcrush usually only gets 1-2% better than typical '-9' zlib compression	2009-10-07 00:24:59
<Shelwien>	well, i meant that png actually has these delta filters	2009-10-07 00:25:15
<asmodean>	yeah i know	2009-10-07 00:25:23
	and you can use a different filter per line	2009-10-07 00:25:34
	i think photoshop tries them all when saving pngs ;p	2009-10-07 00:26:09
	makes it super slow but its files are slightly smaller	2009-10-07 00:26:19
<Shelwien>	anyway, this rep has lots of options	2009-10-07 00:26:36
	and you can experiment with these, starting with -v ;)	2009-10-07 00:26:51
<asmodean>	haha well it's fun but for practical purposes i'm still going with LZMA on all the tgas ;p	2009-10-07 00:27:17
	oh i should winrar the uncompressed file for comparison	2009-10-07 00:27:32
	18mb	2009-10-07 00:29:10
	so rep helped a lot	2009-10-07 00:29:17
<Shelwien>	rep+lzma should be better	2009-10-07 00:30:03
	they all have their limits on dictionary and window size	2009-10-07 00:30:25
	deflate had to work in 64k of memory so its worst of all	2009-10-07 00:30:53
<asmodean>	yeah tweaking is always possible. for general purpose use though you're not going to tweak much	2009-10-07 00:31:01
	at least winrar is pretty fast	2009-10-07 00:31:14
<Shelwien>	and rar won't see a repetition further than 4M	2009-10-07 00:31:17
<asmodean>	lzma is slow as SHIT	2009-10-07 00:31:17
	it's like encasing your data in ice	2009-10-07 00:31:23
	to get it out you have to chip away for days	2009-10-07 00:31:28
<Shelwien>	;)	2009-10-07 00:31:39
	you know, its not necessary to use lzma at ultra mode ;)	2009-10-07 00:31:58
<asmodean>	haha when i started out i even tweaked ultra mode's options upwards	2009-10-07 00:32:15
	so "ultra" is a compromise for me now :P	2009-10-07 00:32:23
	what's a good general purpose level of lzma?	2009-10-07 00:32:36
<Shelwien>	well, anyway, even at its best lzma won't see matches further than 1G or something	2009-10-07 00:33:09
	so rep is preferable for such cases	2009-10-07 00:33:22
	rep + lzma with smaller window	2009-10-07 00:33:29
	and i don't know what's a "general purpose" ;)	2009-10-07 00:33:54
<asmodean>	i should do this compression on my htpc	2009-10-07 00:34:10
	quad core	2009-10-07 00:34:12
<Shelwien>	sure ;)	2009-10-07 00:34:21
<asmodean>	it's faster than my server which runs underclocked at 1.9ghz instead of 3.2 :(	2009-10-07 00:34:31
<Shelwien>	btw, did you see my vectorized rangecoder? ;)	2009-10-07 00:34:47
<asmodean>	nope i've been ignoring this place ;)	2009-10-07 00:35:00
<Shelwien>	sometimes i'm thinking about recommending it to Igor	2009-10-07 00:35:01
	its in the topic (ccm)	2009-10-07 00:35:10
	so if i'd do, you might have to reverse it someday ;)	2009-10-07 00:35:43
<asmodean>	the worst thing is reversing some algorithm that feels like it came from a library but you can't tell which :>	2009-10-07 00:37:06
<Shelwien>	i guess you didn't encounter much of intelc code ;)	2009-10-07 00:37:57
<asmodean>	hm 7zip can't use more than 2 threads	2009-10-07 00:38:02
	why because intelc inlines all over the place?	2009-10-07 00:38:17
<Shelwien>	yeah, and does some even worse stuff ;)	2009-10-07 00:38:31
	like reordering and vectorizing weird places	2009-10-07 00:38:52
	once it vectorize rangecoder i/o %)	2009-10-07 00:39:14
	*vectorized	2009-10-07 00:39:20
	in fpaq0pv4B	2009-10-07 00:39:23
<asmodean>	does intelc produce double digital performance improvements just recompiling with it?	2009-10-07 00:39:54
<Shelwien>	well, if you count in percents, then yeah, probably	2009-10-07 00:40:21
	also depending on tasks	2009-10-07 00:40:30
	but i've got 20% speed improvement	2009-10-07 00:42:26
	from recompiling unrar.dll with intelc	2009-10-07 00:42:33
<asmodean>	surprised more people don't use it	2009-10-07 00:42:37
<Shelwien>	they do in fact	2009-10-07 00:42:45
<asmodean>	maybe i'm lucky the jp dudes haven't discovered it then :)	2009-10-07 00:43:08
<Shelwien>	well, you can search for "GenuineIntel" string in executables and dlls ;)	2009-10-07 00:44:32
<asmodean>	worst optimization headache i've seen is 'whole program' optimization	2009-10-07 00:45:05
<Shelwien>	i always use it ;)	2009-10-07 00:45:21
<asmodean>	embeds lovely assumptions about things being in registers/stack/etc across several levels of function calls	2009-10-07 00:45:23
<Shelwien>	well, i'd use worse things in manual asm though ;)	2009-10-07 00:49:44
	so its all good ;)	2009-10-07 00:49:48
	at least compilers don't keep values in flags across function calls ;)	2009-10-07 00:50:41
	and don't use esp for general i/o ;)	2009-10-07 00:51:08
<asmodean>	heh but manual asm is much harder to write much of	2009-10-07 00:51:24
<Shelwien>	not really, depending on task though	2009-10-07 00:51:38
<asmodean>	i'll take a handful of manual asm than a whole executable of optimized bullshit any day ;p	2009-10-07 00:51:41
<Shelwien>	huh. wanna look at my old compressor written in asm?	2009-10-07 00:52:10
<asmodean>	not really :) i'd rather look at fun easy to understand LZSS variations over either :)	2009-10-07 00:53:13
<Shelwien>	actually i only stopped using asm because of intelc ;)	2009-10-07 00:53:23
	there're no tools for similar global optimization	2009-10-07 00:53:43
	and automatic vectorization	2009-10-07 00:53:47
<asmodean>	lately i spend a lot more time reversing crypto/obfuscation :( ugh	2009-10-07 00:53:49
<Shelwien>	well, its all easy enough if they don't use low-level stuff	2009-10-07 00:54:32
	like drivers etc	2009-10-07 00:54:36
<asmodean>	the only system that bothers me does constant self-checks (anti-debugger, crcs of code sections etc) with inline code	2009-10-07 00:55:28
	makes it terribly irritating to trace the algorithms	2009-10-07 00:55:37
	so i gave up and just dynamically patched it in between checks :P	2009-10-07 00:56:08
	heh	2009-10-07 00:56:09
<Shelwien>	why don't you just block read access to code pages	2009-10-07 00:56:24
	and handle the exception ;)	2009-10-07 00:56:36
<asmodean>	well i'd have to automate backtracking to the code doing the check to get back to the mainline code	2009-10-07 00:57:19
<Shelwien>	i mean, that'd allow you to sniff out all the locations	2009-10-07 00:57:49
<asmodean>	i was finding the checks with memory breakpoints	2009-10-07 00:58:18
<Shelwien>	good too, but there're not much of these ;)	2009-10-07 00:58:46
	and also protections sometimes block them	2009-10-07 00:59:05
<asmodean>	this was a novice system. but since he coded it himself he could make things more annoying by embedding the protection in his algorithms	2009-10-07 00:59:52
	like it if detected you, it doesn't fail sometimes. instead it heads off down some useless path of code	2009-10-07 01:00:12
	which doesn't decrypt data correctly :P	2009-10-07 01:00:26
<Shelwien>	worse case imho is when program works, but slightly wrong	2009-10-07 01:00:45
<asmodean>	right	2009-10-07 01:00:50
<Shelwien>	!next	2009-10-07 01:52:00