*** scott___ has joined the channel		2010-01-22 23:55:48
<scott___>	anybody here	2010-01-23 00:30:11
*** scott___ has left the channel		2010-01-23 00:32:04
<Shelwien>	no patience :)	2010-01-23 00:33:36
*** STalKer-X has joined the channel		2010-01-23 04:21:42
*** STalKer-Y has left the channel		2010-01-23 04:23:16
*** pinc has joined the channel		2010-01-23 07:26:12
*** M4ST3R has joined the channel		2010-01-23 12:20:54
<M4ST3R>	hi	2010-01-23 12:20:57
*** pmcontext has joined the channel		2010-01-23 12:55:46
<pmcontext>	hi	2010-01-23 12:56:22
*** mike_____ has joined the channel		2010-01-23 13:21:31
*** M4ST3R has left the channel		2010-01-23 13:50:42
*** mike_____ has left the channel		2010-01-23 14:29:40
	o.o	2010-01-23 14:52:46
*** mike_____ has joined the channel		2010-01-23 15:03:38
*** pinc has left the channel		2010-01-23 15:25:08
*** encode has joined the channel		2010-01-23 15:36:02
*** encode has left the channel		2010-01-23 15:49:28
*** Shelwien has left the channel		2010-01-23 17:17:40
*** Guest9968193 has joined the channel		2010-01-23 17:17:43
*** mike_____ has left the channel		2010-01-23 18:15:39
<Shelwien>	respawn!	2010-01-23 19:07:57
	pmcontext: seen http://encode.dreamhosters.com/showpost.php?p=10875&postcount=10 ?	2010-01-23 19:08:30
<pmcontext>	hi shelwien :D	2010-01-23 19:09:02
<Shelwien>	hi	2010-01-23 19:09:07
<pmcontext>	seeing now	2010-01-23 19:09:54
*** scott___ has joined the channel		2010-01-23 19:15:11
<Shelwien>	hi	2010-01-23 19:15:22
<pmcontext>	it seems do very well even on enwik8 21MB !!	2010-01-23 19:15:25
	tree is also fixed weights ?	2010-01-23 19:15:52
<Shelwien>	well, BWT does 20.6M on enwik, CM has to be better	2010-01-23 19:15:58
	tree has not only fixed weights, but completely same mixing	2010-01-23 19:16:12
<pmcontext>	i see	2010-01-23 19:16:34
<Shelwien>	better results are due to DEPTH=8192 cutoff in green	2010-01-23 19:16:35
	freqs are not completely the same	2010-01-23 19:16:46
	...actually it looks like i'm still running green on enwik8 :)	2010-01-23 19:18:23
	since like 10 hours ago :)	2010-01-23 19:18:28
<pmcontext>	o.o omg	2010-01-23 19:18:43
<scott___>	it takes 10 hours to run?	2010-01-23 19:18:54
<Shelwien>	well, its slow even with DEPTH=8192	2010-01-23 19:18:56
	more like 20 i guess :)	2010-01-23 19:19:14
	its only 65.5M of 100 now :)	2010-01-23 19:19:26
	also not that its a 3.52Ghz Q9450 :)	2010-01-23 19:19:56
	and it runs on ramdrive too :)	2010-01-23 19:20:11
	*also note	2010-01-23 19:20:36
<pmcontext>	tree byte based mixer ?	2010-01-23 19:22:19
<scott___>	!Shelwien I got my best compresion on a modifiyes arb255 your fpaq0mw is better since its nonstationary. Today I am going to try to put the same modifications in that code and test on DNA to see how good the combo is	2010-01-23 19:22:32
<Shelwien>	%)	2010-01-23 19:22:43
	pmcontext: what? :)	2010-01-23 19:22:52
<scott___>	bad typing hope you could follow it	2010-01-23 19:23:02
<pmcontext>	im having trouble implementing gradient descent	2010-01-23 19:24:16
	i have one weight for each model	2010-01-23 19:24:37
	example w0 , w1 ,w2 and	2010-01-23 19:25:05
<Shelwien>	well, its not completely smooth	2010-01-23 19:25:27
<pmcontext>	and each model gives n0 n1	2010-01-23 19:25:34
<Shelwien>	just try sampling the function (codelength of weights) with some step by single weight	2010-01-23 19:26:06
	and plot it	2010-01-23 19:26:14
<pmcontext>	coz is use the n1/(n1+n0) counter , i cant get the prediction directly wihtout division	2010-01-23 19:26:32
<Shelwien>	i don't see how's that a problemn	2010-01-23 19:26:49
<pmcontext>	my problem is to adjust a weight w' = w + correction	2010-01-23 19:27:11
<Shelwien>	well, the coef for that correction is important	2010-01-23 19:27:29
	and maybe you lost some sign	2010-01-23 19:27:53
<pmcontext>	yes i think correction = K*(y-p)	2010-01-23 19:27:56
	where y-p is the prediction error	2010-01-23 19:28:06
<Shelwien>	and maybe it gives weird results with \|p1-p2\| near 0	2010-01-23 19:28:14
<pmcontext>	but i cant have p , coz i use n1 , n0 counts	2010-01-23 19:28:41
<Shelwien>	again, did you maybe forget about fixed-point shifts?	2010-01-23 19:28:51
	like, if you multiple two N-bit values	2010-01-23 19:29:06
<pmcontext>	yes i rember the shifts , k is power of 2	2010-01-23 19:29:11
<Shelwien>	you're supposed to shift it right by N	2010-01-23 19:29:15
<pmcontext>	i mean inverse power	2010-01-23 19:29:32
	so that (y-p) >> 4	2010-01-23 19:29:47
	while updating i dont have P for each model . i have n0 n1 , to get P i need to do n1/(n1+n0) but this will become too many division	2010-01-23 19:30:21
<Shelwien>	you need to compute it before anyway, right? for mixing?	2010-01-23 19:30:52
	why don't you just save the probs at that point?	2010-01-23 19:31:04
<pmcontext>	coz while mixing i dont do divison per model	2010-01-23 19:31:15
<scott___>	gn8	2010-01-23 19:31:36
<pmcontext>	i combine n1 n0 from all models by multipling weights , then i do one division	2010-01-23 19:31:50
	to get combined P	2010-01-23 19:32:02
*** scott___ has left the channel		2010-01-23 19:32:07
<Shelwien>	well, if you don't, its already a different kind of mixing, it has a different gradient function :)	2010-01-23 19:32:24
<pmcontext>	o.o wat do i do, its been puzzling me	2010-01-23 19:32:41
<Shelwien>	well, i'd suggest to stop trying to make it faster when you didn't make it work yet :)	2010-01-23 19:33:28
<pmcontext>	with fixed weights 239116	2010-01-23 19:34:58
<Shelwien>	is that o3 or what?	2010-01-23 19:35:12
<pmcontext>	it has 5 orders	2010-01-23 19:35:42
<Shelwien>	well, that's not quite good for o4 if that's what you mean (o0-o4)	2010-01-23 19:36:17
<pmcontext>	i use hashtable so some contexts map to the same counter	2010-01-23 19:36:50
	i was thinking i could see how it does with dynamic weights but	2010-01-23 19:37:24
	i got stuck with the weight update problem	2010-01-23 19:37:38
<Shelwien>	well, i suppose you can write a log and see what's your problem where	2010-01-23 19:38:43
	*there	2010-01-23 19:38:46
	1. the weight is supposed to increase for the submodel with best prediction for current symbol	2010-01-23 19:39:29
	2. the repeated mixing for the same inputs with updates weights is supposed to give better probability to current bit	2010-01-23 19:40:05
<pmcontext>	isnt it bad to have 4 divisions ? one per model	2010-01-23 19:44:44
<Shelwien>	it is	2010-01-23 19:44:55
	but why don't you care about that after you get better compression?	2010-01-23 19:45:19
	there're multiple ways to get rid of divisions	2010-01-23 19:45:34
<pmcontext>	ok	2010-01-23 19:45:48
<Shelwien>	btw all these ways generally work at the cost of somewhat worse compression :)	2010-01-23 19:46:29
<pmcontext>	shelwien: have u done gradient based mixer ? any simple one	2010-01-23 20:32:16
<Shelwien>	not linear	2010-01-23 20:32:43
	and not n-ary anyway	2010-01-23 20:32:48
	toffer didn't do it either btw :)	2010-01-23 20:33:03
	its just a theory :)	2010-01-23 20:33:07
<pmcontext>	not linear means ?	2010-01-23 20:33:09
<Shelwien>	well, paq mixer in mix_test is gradient-based	2010-01-23 20:33:33
*** toffer has joined the channel		2010-01-23 20:43:03
<toffer>	hi	2010-01-23 20:43:08
	i just wanted to comment something quickly.	2010-01-23 20:43:15
	gonna go downtown in a few mins	2010-01-23 20:43:22
<pmcontext>	hi	2010-01-23 20:43:48
<toffer>	in the log i've read that you got trouble with gradient descent...	2010-01-23 20:43:52
	well	2010-01-23 20:43:54
<pmcontext>	yes	2010-01-23 20:43:59
<toffer>	to clarify - i implemented it several times	2010-01-23 20:44:03
	in cmm3	2010-01-23 20:44:08
	for n inputs	2010-01-23 20:44:12
	and for two	2010-01-23 20:44:22
	but i used the n input variant only	2010-01-23 20:44:33
	anyway	2010-01-23 20:44:36
<pmcontext>	i have 5 input	2010-01-23 20:44:51
<toffer>	i don't see why you insist on your counter outputting n0 and n1 to be a problem	2010-01-23 20:44:53
	just calculate the probability	2010-01-23 20:45:02
	and use the update formula we derived	2010-01-23 20:45:10
<Shelwien>	i think he mixes the freqs instead	2010-01-23 20:45:22
<toffer>	w' = w + L (p2-p1) (y-p) 1/(p(1-p))	2010-01-23 20:45:26
<Shelwien>	with a single division in the end	2010-01-23 20:45:27
	so there's no separate probabilities	2010-01-23 20:45:38
<toffer>	p = (p2-p1)w + p1	2010-01-23 20:46:03
<Shelwien>	well, he said that divisions are slow :)	2010-01-23 20:46:28
<pmcontext>	p2 and p1 what is it	2010-01-23 20:46:33
<toffer>	input probs	2010-01-23 20:46:39
<pmcontext>	but we have only one p for each model	2010-01-23 20:47:08
<toffer>	and?	2010-01-23 20:47:14
	in cm you want to combine multiple models predictions	2010-01-23 20:47:25
<Shelwien>	mix( p3, mix(p1,p2) )	2010-01-23 20:47:38
<toffer>	yes	2010-01-23 20:47:42
	in that case p = mix(p1,p2) = (p2-p1)w + p1	2010-01-23 20:47:57
<pmcontext>	yes p = w0 * p0 + w1 * p1 + w2 * p2 + w3 * p3	2010-01-23 20:48:15
<toffer>	well	2010-01-23 20:48:27
	as i said	2010-01-23 20:48:30
<pmcontext>	p = (w0 * p0 + w1 * p1 + w2 * p2 + w3 * p3) / ( w0 + w1 + w2 + w3) ,	2010-01-23 20:49:01
<toffer>	...	2010-01-23 20:49:13
	!grep (1-u)	2010-01-23 20:49:26
	that is what these weights look like after cascading several n input mixers	2010-01-23 20:49:56
	and u and v are limited to [0, 1]	2010-01-23 20:50:12
<pmcontext>	oh i see	2010-01-23 20:50:19
<toffer>	to calrify	2010-01-23 20:50:28
	mix(p1,p2) = q	2010-01-23 20:50:41
	mix(q,p3) = final prediction	2010-01-23 20:50:48
	q = (1-u) p1 + u p2, right?	2010-01-23 20:51:05
<pmcontext>	yes	2010-01-23 20:51:21
<toffer>	final prediction - we'd call it p'	2010-01-23 20:51:29
	p' = (1-v) q + v p3	2010-01-23 20:51:42
	right?	2010-01-23 20:51:43
<pmcontext>	yes	2010-01-23 20:51:55
<toffer>	now substitute q into p' and compare the weight coeffs to p' = w1 p1 + w2 p2 + w3 p3	2010-01-23 20:52:24
<pmcontext>	in weight update	2010-01-23 20:53:59
	w' = w + L (p2-p1) (y-p) 1/(p(1-p)) , here i dont understand	2010-01-23 20:54:01
	this we do for each update ?	2010-01-23 20:54:02
	w0 = w0 + L( ? - ? ) * 1/ p(1-p)	2010-01-23 20:54:04
<toffer>	you're fine in using the recursive formulation	2010-01-23 20:54:40
	you got	2010-01-23 20:54:43
	q = mix(p1,p2)	2010-01-23 20:54:48
	q = (p2-p1)u + p1	2010-01-23 20:54:57
	right?	2010-01-23 20:55:00
<pmcontext>	yes	2010-01-23 20:55:11
<toffer>	thus you compute u' = u + (p2-p1) (y-q) 1/(q(1-q))	2010-01-23 20:55:32
	and you got p' = mix(q,p3)	2010-01-23 20:55:47
	now hod you update the weight v in there?	2010-01-23 20:55:54
<pmcontext>	w' = w + L ( p2-p1) * 1/p(1-P)	2010-01-23 20:56:53
<toffer>	no	2010-01-23 20:57:01
	maybe the terming confuses you	2010-01-23 20:57:17
	say	2010-01-23 20:57:39
	c = mix(a, b)	2010-01-23 20:58:01
	c = (b-a) w + b	2010-01-23 20:58:11
<pmcontext>	ok	2010-01-23 20:58:27
<toffer>	c' = c + L (b-a) (y-c) 1/(c(1-c))	2010-01-23 20:58:37
	now you got the mixer	2010-01-23 20:58:56
	q = mix(p1,p2)	2010-01-23 20:59:06
	mistake	2010-01-23 20:59:21
	w' = w + L (b-a) (y-c) 1/(c(1-c))	2010-01-23 20:59:24
	now you got, q = mix(p1,p2) - thus c=q, a=p1, b=p2	2010-01-23 20:59:48
	w' = w + L (b-a) (y-c) 1/(c(1-c)) becomes?	2010-01-23 21:00:05
<pmcontext>	w' = w + L (p2 - p1) (y-q) 1/q(1-q)	2010-01-23 21:00:50
<toffer>	ok	2010-01-23 21:00:57
<pmcontext>	i was trying to update the weights independently	2010-01-23 21:01:20
<toffer>	and now you got p' = mix(q, p2) and that mixer got the weigh v (not to confuse it with w from the previous level)	2010-01-23 21:01:31
<pmcontext>	ok new is v	2010-01-23 21:01:55
<toffer>	and why?	2010-01-23 21:01:59
	just try the current soluiton	2010-01-23 21:02:05
	and improve afterwards	2010-01-23 21:02:10
<pmcontext>	i duno >< , ok	2010-01-23 21:02:14
<toffer>	i have to leave now	2010-01-23 21:02:29
	gn8 and good luck	2010-01-23 21:02:32
<pmcontext>	thank you	2010-01-23 21:02:38
*** toffer has left the channel		2010-01-23 21:02:41
<Shelwien>	btw, you can try a simpler thing first	2010-01-23 21:10:13
	adaptive mix only for 2 inputs of 5	2010-01-23 21:10:56
	it should improve compression anyway	2010-01-23 21:11:30
	and when it'd work, you'd be able to extend it	2010-01-23 21:11:48
<pmcontext>	ok	2010-01-23 21:17:20
	i g2g bye and thanks	2010-01-23 21:25:22
*** pmcontext has left the channel		2010-01-23 21:25:44
*** STalKer-Y has joined the channel		2010-01-24 04:21:18
*** STalKer-X has left the channel		2010-01-24 04:22:21
*** pinc has joined the channel		2010-01-24 10:57:24
*** mike_____ has joined the channel		2010-01-24 11:00:11
*** STalKer-Y has left the channel		2010-01-24 13:25:24
*** STalKer-X has joined the channel		2010-01-24 13:26:30
*** Shelwien has left the channel		2010-01-24 13:27:26
*** Guest9968193 has joined the channel		2010-01-24 13:27:30
*** pinc has left the channel		2010-01-24 13:39:06
*** pmcontext has joined the channel		2010-01-24 14:22:38
*** scott___ has joined the channel		2010-01-24 14:45:16
*** scott___ has left the channel		2010-01-24 14:55:14
*** Krugz has left the channel		2010-01-24 16:18:15
*** toffer has joined the channel		2010-01-24 16:24:14
<toffer>	hi	2010-01-24 16:28:14
<pmcontext>	hii	2010-01-24 16:29:44
	toffer im working on the mixer , and jst got back again	2010-01-24 16:30:43
*** ARTHURIUSS has joined the channel		2010-01-24 16:36:43
<toffer>	os you're doing cascaded two input mixers now?	2010-01-24 16:37:33
	hi ARTHURIUSS - are you new here?	2010-01-24 16:40:05
<ARTHURIUSS>	yeah..	2010-01-24 16:40:29
<toffer>	are you on the forum, too?	2010-01-24 16:40:39
<ARTHURIUSS>	nooo..just a random passerby looking for a matlab program for an IEEE paper	2010-01-24 16:41:24
<toffer>	compression relaetd?	2010-01-24 16:42:27
	related	2010-01-24 16:42:28
<ARTHURIUSS>	hmm..actually it's image encryption and sharing....	2010-01-24 16:43:07
	using shadow images..	2010-01-24 16:43:32
<toffer>	dunnot know about that	2010-01-24 16:44:39
	and how'd you came accross that channel than?	2010-01-24 16:44:47
*** toffer has left the channel		2010-01-24 16:45:38
*** toffer has joined the channel		2010-01-24 16:47:01
*** ARTHURIUSS has left the channel		2010-01-24 16:56:42
<pmcontext>	yes 2 input mixer	2010-01-24 17:00:27
*** scott___ has joined the channel		2010-01-24 17:01:58
<scott___>	hello where is !Shelwien today	2010-01-24 17:02:32
	while toffer maybe you can anwser	2010-01-24 17:03:47
	well toffer	2010-01-24 17:03:53
<pmcontext>	toffer: w' = w + L (b-a) (y-c) 1/(c(1-c)) , do i really multiply by 1/(c(1-c))	2010-01-24 17:04:13
	or can that term be removed ?	2010-01-24 17:04:14
<toffer>	you need to...	2010-01-24 17:04:36
	otherwise you're not following the entropy metric	2010-01-24 17:04:50
<pmcontext>	ok	2010-01-24 17:05:31
<scott___>	is Ilia Muraviev the same as Shelwien?	2010-01-24 17:08:19
<toffer>	erm	2010-01-24 17:08:45
	pretty funny	2010-01-24 17:08:49
	no, of course no	2010-01-24 17:08:52
	t	2010-01-24 17:08:54
	not	2010-01-24 17:08:56
<scott___>	ok I was wondering since I have been looking at fpaq0mw that Shel... gave me but it has the other guys name who did it?	2010-01-24 17:10:29
<toffer>	shelwien just inserted a liner mixer implementatio	2010-01-24 17:10:46
	inplementation	2010-01-24 17:10:51
	implementation	2010-01-24 17:10:54
	damn!	2010-01-24 17:10:56
	guess i'm still drunk	2010-01-24 17:11:00
<scott___>	from beer?	2010-01-24 17:11:08
<toffer>	yesterday i officially celebrated my graduation	2010-01-24 17:12:19
	now i'm officially an engineer	2010-01-24 17:12:29
<pmcontext>	congrats !!!	2010-01-24 17:12:34
<toffer>	dipl.-ing. in german	2010-01-24 17:12:39
	thanks	2010-01-24 17:12:43
<pmcontext>	:D	2010-01-24 17:12:45
<scott___>	any way trying to modify it to cpmparess this DAN packed file file and it does not seem to compress better with mixing	2010-01-24 17:12:53
	now you can drive a train	2010-01-24 17:13:07
<pmcontext>	shelwien inserted on website ?	2010-01-24 17:13:27
<toffer>	not really	2010-01-24 17:13:36
<scott___>	congrats what kind of enginer	2010-01-24 17:13:37
<toffer>	for modelling, automation and control	2010-01-24 17:13:59
<scott___>	my BSEE was in fields and waves while MSEE was in control theory	2010-01-24 17:14:22
<toffer>	sounds like theoretical electectrical engineering	2010-01-24 17:15:25
	your bachelor	2010-01-24 17:15:36
	control theory is equivalent to automation and control i guess	2010-01-24 17:16:15
	and in addition i got the best possible graduation	2010-01-24 17:16:34
	^^	2010-01-24 17:16:37
<scott___>	yes I wsitched from math to electronis since less english and bs humnaity classed I went to school an a maht scholar ship but got more real math in engineering	2010-01-24 17:16:45
	what level BS MS or Phd	2010-01-24 17:17:16
*** Shelwien has joined the channel		2010-01-24 17:17:19
<toffer>	it's equal to ms	2010-01-24 17:17:37
<pmcontext>	darn it giving some big number	2010-01-24 17:17:53
<toffer>	i gonna do my phd than. probably at the local fraunhofer	2010-01-24 17:17:57
<scott___>	good did you have to writw thesis	2010-01-24 17:18:00
<toffer>	of course	2010-01-24 17:18:07
	it was about modelling and predicting stochastic processes with pattern characteristics	2010-01-24 17:18:36
	i made some context mixing predictor	2010-01-24 17:18:42
	which outpreformed the previous solution	2010-01-24 17:19:12
<scott___>	I was luck at USC that had two options. One was to take a test and pass an MS level and wirte a paper the other option was to pass the test ast the equivalent of Phd level I choose that path sa as not to wirte anything	2010-01-24 17:19:19
<pmcontext>	o.o awesome!!	2010-01-24 17:19:27
<scott___>	lucky	2010-01-24 17:19:34
<toffer>	that's pretty weird	2010-01-24 17:20:05
<scott___>	they offered my to stay in school to get a Phd in Plasma Physics but I shoose not to.	2010-01-24 17:20:09
	chose not too.	2010-01-24 17:20:26
*** Guest9968193 has left the channel		2010-01-24 17:20:38
<toffer>	is that a lang time ago?	2010-01-24 17:20:42
<scott___>	that was in the early 70's for ms	2010-01-24 17:21:09
<toffer>	really?	2010-01-24 17:21:38
	i thought you're younger	2010-01-24 17:21:43
<scott___>	USC thorugh rated high. I hated it compared to ASU	2010-01-24 17:21:51
<toffer>	you must be like 60 now?	2010-01-24 17:21:51
<scott___>	yest in my 60's	2010-01-24 17:22:01
	thats not my picture at your site	2010-01-24 17:22:17
	USC univeristy of southern california ASU arizonia state university	2010-01-24 17:23:12
<toffer>	"not my picture on your site"?	2010-01-24 17:23:35
<scott___>	not my picture on any site.	2010-01-24 17:23:47
<toffer>	that image on your profile?	2010-01-24 17:24:38
<scott___>	is just an avator who uses there real picture.	2010-01-24 17:25:28
<toffer>	well i do	2010-01-24 17:25:52
<scott___>	well youlook better than me	2010-01-24 17:26:03
<toffer>	i'm just younger	2010-01-24 17:26:12
<pmcontext>	im more younger	2010-01-24 17:26:26
	:D	2010-01-24 17:26:29
<toffer>	^^	2010-01-24 17:26:40
<pmcontext>	-----------------------------------------------------	2010-01-24 17:26:46
	strange i got w = 5 , p1 = 4 , p2 = 5	2010-01-24 17:26:48
	p = (w*(p1-p2)+p2) = 0 , resulted in 0	2010-01-24 17:26:49
	, my mixer update bombed due to division by zero	2010-01-24 17:26:51
	dw = ((p2-p1)((b<<12)-p)/(p(4096-p)))>>4 , since p is 0 it blowed division with 1/p(1-p)	2010-01-24 17:26:52
<scott___>	I had a fu manchu mostache in ealy college years with my motorcle if I ever find it may use that some as my pic	2010-01-24 17:27:15
<toffer>	1. p = (1-w) p1 + w p2 = (p2-p1) w + p1	2010-01-24 17:28:19
	2. i told you that for p<t, p>=1-t you should make g(p) = 1/(p(1-p)) equal to zero for some small threshold t	2010-01-24 17:29:00
	you mean that guy from some movie?	2010-01-24 17:29:30
	dr. fu manchu	2010-01-24 17:29:33
	such a picture would be a good avatar i guess	2010-01-24 17:29:49
<pmcontext>	oh my bad	2010-01-24 17:31:07
<scott___>	I don't have a Phd in fact when working with Gil his the one that wrote the paper I just BSed on phoe after he got it to work he asked about my Phd and where I taught	2010-01-24 17:31:15
	The only teaching was at a juniot college where the kids never learn to add factions and I did not want that scholl even mentioned	2010-01-24 17:32:11
	I have never talked to any on voice the forum but Mike boss and I talked on the phone.	2010-01-24 17:33:05
<toffer>	somehow i feel kids get dumber the more time passes by	2010-01-24 17:33:07
<pmcontext>	p=(w*(p1-p2)+p2); for this should i do a check for the range of p ?	2010-01-24 17:34:11
	after p is computed	2010-01-24 17:34:22
<toffer>	1. p = (1-w) p1 + w p2 = (p2-p1) w + p1	2010-01-24 17:35:32
	your formula is still wrong	2010-01-24 17:35:39
	see?	2010-01-24 17:36:00
	you just need to make sure w is in [0, 1]	2010-01-24 17:36:44
<scott___>	it makes no difference his formula same as yours	2010-01-24 17:37:26
<toffer>	no	2010-01-24 17:37:41
	(p2-p1) w + p1 vs. w*(p1-p2)+p2	2010-01-24 17:37:57
	weight update relies on the order of p1 and p2	2010-01-24 17:38:22
	it must be p2-p1 but not p1-p2	2010-01-24 17:38:31
	anyway gonna go for dinner now	2010-01-24 17:38:39
<scott___>	sorry	2010-01-24 17:39:11
<pmcontext>	coz in mix (mix( p1 , p2 ) , p3 ) ?	2010-01-24 17:39:13
	or mix (p3 , mix( p1 , p2 ) )	2010-01-24 17:39:15
<toffer>	because it's mix(a,b) but not mix(b,a) with a=p1, b=p2	2010-01-24 17:39:45
<pmcontext>	oh ok	2010-01-24 17:40:14
<toffer>	same for a=p3, b=mix(p1,p2)	2010-01-24 17:40:17
	and so on...	2010-01-24 17:40:20
<pmcontext>	yes i und	2010-01-24 17:40:23
<toffer>	you just need to make a single mixer class	2010-01-24 17:40:28
	with something like int Mix(int p1, int p2)	2010-01-24 17:40:45
<scott___>	yes I was wrong i really am not into this yet	2010-01-24 17:40:48
<toffer>	anyway dinner calls	2010-01-24 17:41:03
	good luck	2010-01-24 17:41:04
<pmcontext>	i made one	2010-01-24 17:41:18
	class MIXER {	2010-01-24 17:41:19
	public :	2010-01-24 17:41:21
	int w;	2010-01-24 17:41:22
	int p,p1,p2;	2010-01-24 17:41:24
	MIXER(){w=32;}	2010-01-24 17:41:25
	int mix(int _p1, int _p2){	2010-01-24 17:41:27
	p1=_p1;	2010-01-24 17:41:28
	p2=_p2;	2010-01-24 17:41:30
	p=(w*(p2-p1)+p1);	2010-01-24 17:41:31
	p = (p<0 ? 0 : (p>4095 ? 4095 : p));	2010-01-24 17:41:33
	return p;	2010-01-24 17:41:34
	}	2010-01-24 17:41:36
	void update(int b){	2010-01-24 17:41:37
	working on this ...	2010-01-24 17:41:39
	w += dw;	2010-01-24 17:41:40
	w = (w<0 ? 0 : (w>64 ? 64 : w));	2010-01-24 17:41:42
	}	2010-01-24 17:41:44
	void printw(){printf("w=%d\n",w);}	2010-01-24 17:41:45
	};	2010-01-24 17:41:47
	ok see you after dinner	2010-01-24 17:42:03
<scott___>	how you determine between to models that which to make P1 and which to make P2 just try both and see what happens or what?	2010-01-24 17:42:12
<pmcontext>	we use this order mix ( mix (p1 , p2 ) , p3 )	2010-01-24 17:42:51
<scott___>	put aren;t the Ps different models?	2010-01-24 17:43:26
<pmcontext>	yes	2010-01-24 17:44:33
<scott___>	if I hav 2 different model P1 and P2 your saying that the mixing depenfs on which I call P1 and P2 if I reverse the order I get a different anwser for the final guess of P	2010-01-24 17:45:08
*** mike_____ has left the channel		2010-01-24 17:47:13
*** mike_____ has joined the channel		2010-01-24 17:52:47
<toffer>	storing p1 and p2 is redundant	2010-01-24 18:05:27
	int mix( const int dp, const int p2 ) { return w*dp+p1; } call it with mix(p2-p1,p1).	2010-01-24 18:05:58
	int update(int dp, int e, int p) { if (p>T && p<=ONE-T) w=max(0, min(W_ONE, (int)w + dpep/(p*(1-p))>>L )); }	2010-01-24 18:09:17
	and add some symbolic constants	2010-01-24 18:09:23
	like mixer { static const int ONE=1<<6; // 6 bit weights ... }	2010-01-24 18:09:49
	gn8 guys	2010-01-24 18:15:43
*** toffer has left the channel		2010-01-24 18:15:57
*** pmcontext has left the channel		2010-01-24 18:38:27
<scott___>	hello !Shelwien	2010-01-24 18:45:21
<Shelwien>	too late today, i guess :)	2010-01-24 18:59:19
<scott___>	still here	2010-01-24 19:04:45
<Shelwien>	hi :)	2010-01-24 19:04:53
<scott___>	I have some questions about fpaq0wm	2010-01-24 19:05:24
<Shelwien>	?	2010-01-24 19:05:51
<scott___>	well for one thing when I use the dna file it seems to expand	2010-01-24 19:06:28
	how dow you pick the P1 equation and the P2 equation	2010-01-24 19:06:54
	one you shif 4 and the other 8	2010-01-24 19:07:21
<Shelwien>	more or less random	2010-01-24 19:07:44
	it was just a demo of dynamic mixing	2010-01-24 19:07:56
<scott___>	so one would have to play with a bunch of several shitf for p1 and p2 to get the best	2010-01-24 19:08:49
<Shelwien>	i'd prefer something like in http://www.ctxmodel.net/rem.pl?-6 instead	2010-01-24 19:09:42
<scott___>	here is the thing I was playing and lost the modifed arb255 which uses a single P based on 64bit count registers for ones and zeros	2010-01-24 19:10:03
	and thought since static your would do much better	2010-01-24 19:10:22
	fpaq0pv4B is this what you mean?	2010-01-24 19:11:24
<Shelwien>	specifically the version linked there	2010-01-24 19:11:46
<scott___>	that is the version lined there	2010-01-24 19:12:17
<Shelwien>	well, anyway, the update function is described in the text, along with optimal parameter values for different types of data	2010-01-24 19:12:35
	wbich "that"? http://ctxmodel.net/files/fpaq0pv4B_wr.rar is linked there	2010-01-24 19:12:58
	and there're a few other branches of the same coder	2010-01-24 19:13:10
<scott___>	but then you would need to now the int wr_mw[][2] = { valuse which are different for each file.	2010-01-24 19:13:58
<Shelwien>	why, you do need to know these	2010-01-24 19:15:21
	of course you can select some parameters (for example, by the table there) with good average results	2010-01-24 19:15:57
<scott___>	well for one think I am using the compressed 2 bits / symbol DNA file so would have to guess which one is corrent	2010-01-24 19:16:30
<Shelwien>	try them all? :)	2010-01-24 19:17:19
<scott___>	well if it compresses good would you later tune it to a better set	2010-01-24 19:17:57
	you have a fast way of tuning you models.	2010-01-24 19:18:22
	your models	2010-01-24 19:19:03
<Shelwien>	well, there's no better way than trying all the parameter sets anyway :)	2010-01-24 19:19:44
	but in the end we have to use some heuristics	2010-01-24 19:21:13
<scott___>	you mean guess	2010-01-24 19:22:10
<Shelwien>	more or less	2010-01-24 19:22:21
	its a guess, but its based on some assumptions about the behavior of the codelength function	2010-01-24 19:23:22
<scott___>	Predictor(): cxt(1) { this line confuses me I no C better but what does the " cxt(1) " do I had to create other varibles and just later did the x = 1; type of thing but why no commas or ; stuff	2010-01-24 19:25:23
	i know c	2010-01-24 19:25:39
	I can edit matts and your code but not really familiar with just what the line above does.	2010-01-24 19:26:24
<Shelwien>	Predictor(): cxt(1) {...} is equal to Predictor() { cxt=1; ... } in C++	2010-01-24 19:26:42
<scott___>	so its designed to confuse c coders	2010-01-24 19:28:39
	can you do then Preictor(): ctx(1) : xzy(2) to mean ctx = 1; xzy = 2;	2010-01-24 19:29:48
<Shelwien>	yea, but Preictor(): ctx(1), xzy(2)	2010-01-24 19:34:17
	and both ctx and xzy have to be class members	2010-01-24 19:34:34
<scott___>	ok thanks I moded your code and had to the xyz = 1; thing in my mind I was worried there may have been side affects that I could not follow but you have cleared it up thank you?	2010-01-24 19:35:54
	I will play with the way I have changed fpaq0mw then if it fails to get very good I think I will go back to kluding up arb255 tohandle the mess un a stationay way when done May try to add windowing probabiliy	2010-01-24 19:38:01
<Shelwien>	not that fpaq0mw is still order0	2010-01-24 19:38:31
	and it has to encode 8 bits per symbol for DNA data	2010-01-24 19:38:49
<scott___>	I don't think I have a good enough so handle nonstionary	2010-01-24 19:38:58
	no i changed it hadles more	2010-01-24 19:39:15
	yes I got got rsults from arb255 modifed though order 0 to	2010-01-24 19:39:53
	will go to order one later	2010-01-24 19:40:17
	the numbers I give you the other day for a order 0 stationary code of the e.col	2010-01-24 19:40:52
<Shelwien>	well, with bits 7..0 it encodes e.coli to 1197655	2010-01-24 19:41:26
	but with only bits 2..1 it encodes it to 1147130	2010-01-24 19:41:40
<scott___>	what do you mean two bits?	2010-01-24 19:42:49
	mix 6 8 1143894 in sa	2010-01-24 19:43:08
<Shelwien>	i don't know what you mean either :)	2010-01-24 19:44:00
	but bits 2..1 of acgt are enough to identify all symbols	2010-01-24 19:44:22
<scott___>	by two 2 do you mean regular compacted to 4 per character	2010-01-24 19:44:57
	a 00 c 01 g10 t 11	2010-01-24 19:45:16
<Shelwien>	a 61 = 01100<00>1	2010-01-24 19:46:13
	c 63 = 01100<01>1	2010-01-24 19:46:13
	g 67 = 01100<11>1	2010-01-24 19:46:13
	t 74 = 01110<10>0	2010-01-24 19:46:13
<scott___>	and	2010-01-24 19:47:13
	look I compress e,col to a file of 1159673 use a = 00 c =01 g= 10 t = 11	2010-01-24 19:48:45
	then compress that with a modifed fpaq0wm	2010-01-24 19:49:08
	this what I meant what do yo mean what did you compress	2010-01-24 19:49:34
<Shelwien>	a=00 c=01 g=11 t=10 i guess	2010-01-24 19:50:49
<scott___>	ok so whne you do that you end up doing zero order on two bit fields	2010-01-24 19:52:27
<Shelwien>	sure, and there is some compression	2010-01-24 19:53:16
<scott___>	you only need like int p[4;	2010-01-24 19:53:23
	p[4]	2010-01-24 19:53:38
<Shelwien>	but obviously fpaq0 parameters are not optimized for dna data	2010-01-24 19:53:42
<scott___>	yes that why you need so other mods	2010-01-24 19:54:00
	some ohter mods to it	2010-01-24 19:54:14
	to start with p[4] is to small p[64] works better	2010-01-24 19:54:56
	why grap 2 bits whne 6 is better	2010-01-24 19:56:06
<Shelwien>	well, sure you need a higher order context there	2010-01-24 19:57:25
<scott___>	using the fpaqowm mdoel I changed the bits grapped need to make a symbol. in your model 6 bits the best. When I did it no arb 12 bits the best.	2010-01-24 20:03:43
	so I am father changing model and working with 12 till I add in all the features I had in arb	2010-01-24 20:04:27
	like it runs I compress and uncompress back and do a fc /b on the files.	2010-01-24 20:04:56
<Shelwien>	you can also try assigning various permutations of 2 bits to symbols	2010-01-24 20:05:04
<scott___>	can show you the code at this point if you wnat to see it	2010-01-24 20:06:16
<Shelwien>	i mean, there're 24 permutations of acgt	2010-01-24 20:06:25
<scott___>	I can show ...	2010-01-24 20:06:25
<Shelwien>	and you can map any of them to 00 01 10 11	2010-01-24 20:06:40
	and that would affect compression with a bitwise coder	2010-01-24 20:07:05
<scott___>	your right any would have worked	2010-01-24 20:07:20
	only if you count with one p there would me two answers if you p[2] there are 4 seperate leaves so no real difference in compressed size	2010-01-24 20:08:24
	in short if 100 a's 1 c 1 g 100 t then a 00c 01g 10 t 11 gives a bad answes	2010-01-24 20:09:57
	bad chocie	2010-01-24 20:10:56
<Shelwien>	not really... compression is determined by entropy model	2010-01-24 20:11:30
<scott___>	better 100 a 200 c 300 g 400 t the ratio of 1 ot 0 would depend on how you assing a c g t	2010-01-24 20:12:31
	but the relative ratiors of a c g t the same not matter who you assing it	2010-01-24 20:12:58
<Shelwien>	and in dna A and T, C and G are "complementary"	2010-01-24 20:13:05
<scott___>	how you assign it	2010-01-24 20:13:12
	on oppostie strands	2010-01-24 20:13:44
<Shelwien>	so for a context AATCCG you can sometimes find a corresponding TTAGGC	2010-01-24 20:14:00
<scott___>	but if you look at e.col you find all 4 a different number	2010-01-24 20:14:11
	that could be account for in the model	2010-01-24 20:15:21
	accounted for in the model	2010-01-24 20:15:34
	in fact i think any combinatu possible	2010-01-24 20:15:54
	look at the fisrt order model (which is not done) I can see that at grouping of 3 condons and 6 very good when I varied the number of bits in a symbol at least in arb I got min 3 symbol 6 bits and a bigger one at groupising od 6 12 bits	2010-01-24 20:18:09
	on hte fpaq model 3 was best but 6 a local best which is 12 bits 10,11. 13.14 all bad	2010-01-24 20:19:28
	so decided to make basci symbol size 6 units or 12 bits of the packed dna	2010-01-24 20:20:15
	does this make sense	2010-01-24 20:20:28
	now I have not done the frame reset in the fp version that made a big difference in the arb version	2010-01-24 20:21:22
	any questions of any of the butchered lines I wrote need claifing	2010-01-24 20:27:13
	here is what I think it is you see a file as characters.	2010-01-24 20:28:22
	I see a file as ones and zeroes.	2010-01-24 20:28:34
	i see the normal DNA acgt characater stuff as a waste of ones and zeroes,	2010-01-24 20:29:06
<Shelwien>	whatever... the point is that you can assign different binary codes to input symbols	2010-01-24 20:29:41
<scott___>	so I chage it to a file of oneze and zeroes where very 2 bits meant a chanacter in the old file set	2010-01-24 20:29:42
	however I then group so that when viewed from the outsise I use 6 charcter as a basic symbol	2010-01-24 20:31:17
	so know 64 leaves while running is the whole space of symbols.	2010-01-24 20:32:03
<Shelwien>	but with bitwise coding	2010-01-24 20:32:34
<scott___>	now and know are alwasy a bitch I get the two confused in typeing	2010-01-24 20:32:39
<Shelwien>	you assign probabilities not only to symbols	2010-01-24 20:32:42
	but to sets of symbols as well	2010-01-24 20:32:46
<scott___>	are you asking a question or correcting what I said or what?	2010-01-24 20:33:33
<Shelwien>	that's why your binary decompositions of symbols affect the compression	2010-01-24 20:33:51
	unrelated to efficiently of bitcode	2010-01-24 20:34:09
<scott___>	I am actaully only looking at large symbol sets. but I always making them binary	2010-01-24 20:34:52
	sort of the way fpaq0 is only looking at 257 symbols	2010-01-24 20:35:29
<Shelwien>	as i said, there're different binary codes for the same symbols	2010-01-24 20:35:46
	like a:00 c:01 g:10 t:11	2010-01-24 20:35:48
<scott___>	they ise 8 bits for the bytes and anothe wastefully for the EOF	2010-01-24 20:35:55
<Shelwien>	and a:00 c:10 g:11 t:01	2010-01-24 20:35:58
	and that affects the following compression with a bitwise model like fpaq0	2010-01-24 20:36:42
<scott___>	yes and in this case of the symbol tree is even in length you get the same length compressions no matter how you assing them	2010-01-24 20:37:00
<Shelwien>	that "length" only affects compression speed, not ratio	2010-01-24 20:37:38
<scott___>	take any file swap character codes and compress again with fpap0 you should get the same length except for round offs	2010-01-24 20:38:15
	bwt of character makes a big difference but zero order makes no difference.	2010-01-24 20:39:08
<Shelwien>	there's no difference only with alphabet coding	2010-01-24 20:40:10
	but with bitwise coding there's a significant difference	2010-01-24 20:40:24
<scott___>	your write is will make a difference if you group based on closeness	2010-01-24 20:41:09
	your right I was wrong	2010-01-24 20:41:18
*** scott____ has joined the channel		2010-01-24 20:43:37
<scott____>	you are right I was wrong bad memory	2010-01-24 20:43:57
	the browser crashed so I was off	2010-01-24 20:44:11
<Shelwien>	...	2010-01-24 20:44:28
<scott____>	any way I forgot I wondered why arb255 did get different compression with different chacters sets	2010-01-24 20:44:53
<Shelwien>	:)	2010-01-24 20:45:32
<scott____>	I found that if you give each charter a weight of one on the leaves and then adde as you go back up so top cell has a weight of 256 then they compress to same lenght	2010-01-24 20:46:03
	it gets the same compression when you have pernutaions of the input file.	2010-01-24 20:46:39
<Shelwien>	http://compressionratings.com/bwt.html#transformation	2010-01-24 20:46:39
	[6] there uses the alphabet permutation which i optimized for enwik	2010-01-24 20:47:19
<scott____>	permutiations	2010-01-24 20:47:29
	bad speilling	2010-01-24 20:47:40
<Shelwien>	you can see that it improves compression for most of the coders	2010-01-24 20:48:03
<scott____>	what improves it?	2010-01-24 20:48:24
	sorry I see it	2010-01-24 20:48:39
	you are correct since I use a binary tree I would need to do that	2010-01-24 20:49:30
	i suspect when done the easy way is to check all permutaions and see what gives the best	2010-01-24 20:50:25
<Shelwien>	unfortunately that's not easy at all	2010-01-24 20:50:53
<scott____>	but since I group in units of 9 it would be a real mess since 2**9 ways	2010-01-24 20:51:19
	sorry 6 coden unit for 12 bits for 2**13	2010-01-24 20:51:52
	2**12	2010-01-24 20:51:58
	typo after typo	2010-01-24 20:52:13
	and I have not had any beer	2010-01-24 20:52:25
<Shelwien>	think it could help? :)	2010-01-24 20:52:55
<scott____>	it always seems to my body is made for beer	2010-01-24 20:53:13
	by	2010-01-24 20:57:20
*** scott____ has left the channel		2010-01-24 20:57:26
*** maniscalco has joined the channel		2010-01-24 21:51:41
<Shelwien>	hi :)	2010-01-24 21:52:11
*** Krugz has joined the channel		2010-01-24 21:57:20
<maniscalco>	so, shelwien, you have been doing work lately to simply CM for new people	2010-01-24 21:59:09
	?	2010-01-24 21:59:11
<Shelwien>	in a way, i guess	2010-01-24 21:59:38
	not that it really can be called "work"	2010-01-24 21:59:58
<maniscalco>	i also always wondered about ACB. its a bit of a legend really	2010-01-24 22:00:05
	ha	2010-01-24 22:00:13
	if you are having fun ....	2010-01-24 22:00:18
	you can't call it work	2010-01-24 22:00:22
<Shelwien>	no, i mean it didn't really take much time	2010-01-24 22:00:32
<maniscalco>	well, sure	2010-01-24 22:00:39
<Shelwien>	i was experimenting before with data structures	2010-01-24 22:00:55
<maniscalco>	but making something that can be followed easily .... that takes effort	2010-01-24 22:00:57
<Shelwien>	and made that "tree" parser	2010-01-24 22:01:10
	then added some simple coding to it	2010-01-24 22:01:18
	well, that was long ago	2010-01-24 22:01:34
	and recently a new guy appeared here, and started asking questions about CM	2010-01-24 22:02:05
<maniscalco>	i have not looked at the code, just the posts. "easily followed" russin style coding ... or for the rest of us? (^:	2010-01-24 22:02:05
	(I know, you are not russian)	2010-01-24 22:02:14
<Shelwien>	err, i am :)	2010-01-24 22:02:19
<maniscalco>	i thought you are ukranian!	2010-01-24 22:02:29
<Shelwien>	its kinda the same	2010-01-24 22:02:36
<maniscalco>	hmm	2010-01-24 22:02:41
	tell my wife that	2010-01-24 22:02:47
	she is "ot bulgaria"	2010-01-24 22:02:59
<Shelwien>	well, that's different	2010-01-24 22:03:10
<maniscalco>	how so ?	2010-01-24 22:03:16
<Shelwien>	but russia and ukraine was parts of the same USSR not that long before :)	2010-01-24 22:03:25
<maniscalco>	she gave me a daughter by the way	2010-01-24 22:03:31
	yesterday	2010-01-24 22:03:33
<Shelwien>	yeah, i've seen your blog :)	2010-01-24 22:03:40
	congratulations :)	2010-01-24 22:03:46
<maniscalco>	thank you	2010-01-24 22:03:50
	but russia cut of gas to ukraine	2010-01-24 22:04:02
	a year ago	2010-01-24 22:04:08
	how close can they be ?	2010-01-24 22:04:13
<Shelwien>	not really	2010-01-24 22:04:13
<maniscalco>	oh, well in bulgaria, they were cold as a result	2010-01-24 22:04:28
<Shelwien>	err, its 12 hours to moscow by train from here	2010-01-24 22:04:30
<maniscalco>	I was there	2010-01-24 22:04:33
<Shelwien>	and anyway i'm russian	2010-01-24 22:04:45
<maniscalco>	we bostonians were not so cold,but the bulgarians were	2010-01-24 22:04:49
	(^:	2010-01-24 22:04:51
	ok, i don't mean to offend	2010-01-24 22:05:03
	i just try to understand	2010-01-24 22:05:07
	(^:	2010-01-24 22:05:11
<Shelwien>	no problem, but ukraine is still kinda an artifical country	2010-01-24 22:05:41
<maniscalco>	so, how did you come to know ACB ?	2010-01-24 22:05:41
	oh.	2010-01-24 22:05:51
<Shelwien>	well, you can say that it was what got me into compression	2010-01-24 22:06:24
<maniscalco>	but i didnt think that anyone knew his algortihm	2010-01-24 22:06:43
	its in the books	2010-01-24 22:06:46
	but not really described	2010-01-24 22:06:52
<Shelwien>	there's even a source	2010-01-24 22:06:59
<maniscalco>	and i never saw any results	2010-01-24 22:07:03
<Shelwien>	and original article(s)	2010-01-24 22:07:05
	its just too old	2010-01-24 22:07:10
<maniscalco>	in russian, maybe?	2010-01-24 22:07:13
<Shelwien>	its from 1997	2010-01-24 22:07:13
	there's a link in the thread, did you see it?	2010-01-24 22:07:26
<maniscalco>	i was busy having a daughter	2010-01-24 22:07:42
	i have new results for M03 btw	2010-01-24 22:07:58
	but they are not "proven"	2010-01-24 22:08:04
	as i have not had time to match the decoder model to the changes in the encoder	2010-01-24 22:08:15
	I know not to announce results before the decoder	2010-01-24 22:08:24
	but	2010-01-24 22:08:25
	easy change	2010-01-24 22:08:30
	I predict 206 ,000 for book1	2010-01-24 22:08:42
	same speed	2010-01-24 22:08:47
	well.... 206,xxx	2010-01-24 22:08:59
	that is	2010-01-24 22:09:01
	it turns out that # of symbols in a parent context are closely related to the distribution patern in the children	2010-01-24 22:09:41
	if there are more children to a single parent etc	2010-01-24 22:09:53
	that is	2010-01-24 22:09:55
	if a parent has N symbols	2010-01-24 22:10:02
	and there are K children contexts to that one parent context	2010-01-24 22:10:13
	add (K/N) to the model	2010-01-24 22:10:27
	and you get big results	2010-01-24 22:10:34
	(3 bits)	2010-01-24 22:10:38
	im not sure i underatnd why yet	2010-01-24 22:10:52
	and my daughter was born so i have not had time to work on it	2010-01-24 22:11:06
*** Shelwien has left the channel		2010-01-24 22:12:15
*** Guest9968193 has joined the channel		2010-01-24 22:12:22
<Shelwien>	sorry, got disconnected	2010-01-24 22:12:41
<maniscalco>	np	2010-01-24 22:12:45
	can you read the previous posts?	2010-01-24 22:12:59
	or should i repost	2010-01-24 22:13:08
<Shelwien>	i have the log, yeah	2010-01-24 22:13:16
	btw	2010-01-24 22:13:28
	http://nishi.dreamhosters.com/chantailh.cgi?100 is the log	2010-01-24 22:13:31
	and also http://nishi.dreamhosters.com/log/	2010-01-24 22:13:43
<maniscalco>	ah,	2010-01-24 22:14:20
	good thanks	2010-01-24 22:14:22
	"/log/" is useful	2010-01-24 22:14:53
	didnt know about that	2010-01-24 22:14:59
<Shelwien>	there's also search here	2010-01-24 22:15:14
	!grep maniscalco>	2010-01-24 22:15:19
<maniscalco>	(^:	2010-01-24 22:15:32
	im not so good with computers	2010-01-24 22:15:39
	i just know the algorithms	2010-01-24 22:15:45
<Shelwien>	that's kinda unrelated to computers :)	2010-01-24 22:15:52
<maniscalco>	yes	2010-01-24 22:15:57
<Shelwien>	i mean, its my script :)	2010-01-24 22:16:04
<maniscalco>	anyhow, I have found that there is a relationship between the distribution of symbols in the child context with regard to the parent context	2010-01-24 22:16:42
	more or less symbols in the parent	2010-01-24 22:17:03
<Shelwien>	well, sure, it can be seen in PPM/CM as well	2010-01-24 22:17:14
<maniscalco>	right. BWT is like PPM in this way	2010-01-24 22:17:33
<Shelwien>	like, there're a few kinds of contexts	2010-01-24 22:17:37
<maniscalco>	i just cant quite account for the big imrpovements just by adding this to the models	2010-01-24 22:17:55
<Shelwien>	most important are deterministic, with 1 or ~1 symbol in the alphabet	2010-01-24 22:18:10
<maniscalco>	i guess it suggets that my original models were not respecting this phenomemon	2010-01-24 22:18:13
	right	2010-01-24 22:18:21
	i think the same	2010-01-24 22:18:25
	its more of a special case	2010-01-24 22:18:30
<Shelwien>	but there're more special cases	2010-01-24 22:18:34
<maniscalco>	which happens to be common at high order	2010-01-24 22:18:39
<Shelwien>	for example, considered dictionary files	2010-01-24 22:18:50
<maniscalco>	like shkarin does with "binary contexts"	2010-01-24 22:18:58
	its really just for special high order cases	2010-01-24 22:19:08
<Shelwien>	sure, these binary contexts are deterministic	2010-01-24 22:19:18
<maniscalco>	keeps the general model from getting messed up	2010-01-24 22:19:21
<Shelwien>	they're only binary because of escape :)	2010-01-24 22:19:27
	anyway, i was talking about wordlists	2010-01-24 22:19:53
	like english.dic	2010-01-24 22:19:57
<maniscalco>	well, i think they reflect the special syntax of the language rather then the general rules of the language	2010-01-24 22:19:58
	go ahead ...	2010-01-24 22:20:06
	there is a thorn for BWT	2010-01-24 22:20:15
	im sure i can get this to work for M03 someday	2010-01-24 22:20:27
<Shelwien>	they're a special case for BWT (and PPM/CM too, but less so)	2010-01-24 22:20:32
<maniscalco>	but its a special case	2010-01-24 22:20:34
<Shelwien>	because there's a lot of "garbage" contexts	2010-01-24 22:21:00
<maniscalco>	i never really thought about it because i know its detectable and a filter can adjust to help bwt	2010-01-24 22:21:04
	yes	2010-01-24 22:21:07
	and you loose the significance of garbage vs important with BWT	2010-01-24 22:21:23
	it cant tell what is important before the transform	2010-01-24 22:21:37
	so it mashes them all together	2010-01-24 22:21:44
	where as	2010-01-24 22:21:46
<Shelwien>	yeah, though i never heard of a PPM/CM which would specifically handle such cases	2010-01-24 22:22:02
<maniscalco>	PPM understands that garbage is at the lower orders	2010-01-24 22:22:04
<Shelwien>	err, no	2010-01-24 22:22:13
	as i said, wordlists are a good example, though the same behavior appears anywhere	2010-01-24 22:22:30
<maniscalco>	sure, the statistics at lower orders are less effects by the oddities	2010-01-24 22:22:34
	because they are normalized more frequently	2010-01-24 22:22:46
<Shelwien>	i mean, you have repeatable suffixes in words	2010-01-24 22:22:47
<maniscalco>	that is only part of it though	2010-01-24 22:22:54
	yes	2010-01-24 22:22:57
<Shelwien>	like -ment or -ness, lots of even longer ones	2010-01-24 22:22:58
	and with wordlists, most models try to predict the prefix of the next word	2010-01-24 22:23:24
	in context of <suffix><LF>	2010-01-24 22:23:34
	and its a big mistake	2010-01-24 22:23:47
<maniscalco>	and do you think that the models tend to default "escape" to lower orders each time they hit "ing" "est"etc	2010-01-24 22:23:49
	so they reach the root and re-model quickly ?	2010-01-24 22:24:01
<Shelwien>	no, that'd be good instead, if they could escape to low orders there	2010-01-24 22:24:09
	but normally they see some statistics in high orders	2010-01-24 22:24:27
<maniscalco>	it would be intersting to have secondary models	2010-01-24 22:24:39
<Shelwien>	and try to use them, but distributions don't match at all	2010-01-24 22:24:48
<maniscalco>	which are intended to model "garbage"	2010-01-24 22:24:50
<Shelwien>	well, ppmonstr	2010-01-24 22:25:01
<maniscalco>	and find ways of mapping back into sensical posisitions in the normal model	2010-01-24 22:25:07
	that is "ing" happens	2010-01-24 22:25:13
<Shelwien>	has a sparse submodel which kinda can handle that i guess	2010-01-24 22:25:13
<maniscalco>	but tends to map back to lower order 4-5 contexts	2010-01-24 22:25:30
	regardless of how long xxxxxxxxx-ing is	2010-01-24 22:25:40
	but this is language based	2010-01-24 22:25:56
	i guess im more interested in solving abitrary signals	2010-01-24 22:26:12
	there, bwt is usually god	2010-01-24 22:26:19
	good	2010-01-24 22:26:20
<Shelwien>	well, i think this requires a different approach	2010-01-24 22:26:30
<maniscalco>	someday, i plan on adding ppm like statistics to M03	2010-01-24 22:26:43
	its trivial to gather the statistics	2010-01-24 22:26:52
<Shelwien>	like, specifically detecting garbage contexts by their alphabet size/distributions/entropy	2010-01-24 22:27:00
<maniscalco>	but its complex (memory wise) to know what context you are currently modelling	2010-01-24 22:27:16
	that's why i say a different small model is god	2010-01-24 22:27:34
	good	2010-01-24 22:27:36
	for english	2010-01-24 22:27:47
	an order 3 model would pick up ing, ed(blank) etc	2010-01-24 22:28:04
	and if the standard model could respect the predicted changes in the small model	2010-01-24 22:28:26
	....	2010-01-24 22:28:29
	then the standard model might predict "escape" better	2010-01-24 22:28:46
	regardless of if the state in the standard model is fairly new	2010-01-24 22:29:02
	or not	2010-01-24 22:29:05
	of dict .... imagine a new context "sort"	2010-01-24 22:29:37
	then comes	2010-01-24 22:29:41
	"i"	2010-01-24 22:29:44
	then "n"	2010-01-24 22:29:47
	etc....	2010-01-24 22:29:49
	this short context model would have a high indicator that "g" and EOL	2010-01-24 22:30:05
	are comming next	2010-01-24 22:30:08
<Shelwien>	sure	2010-01-24 22:30:18
<maniscalco>	even though the main model has never seen "sort"	2010-01-24 22:30:22
<Shelwien>	but the problem is after that :)	2010-01-24 22:30:24
<maniscalco>	i dont think so	2010-01-24 22:30:32
<Shelwien>	like, in a sorted wordlist	2010-01-24 22:30:35
<maniscalco>	localized errors are the problem	2010-01-24 22:30:40
	well, for BWT at least	2010-01-24 22:30:45
<Shelwien>	by history of context "ing<LF>" you would never be able to predict the next symbol	2010-01-24 22:31:08
<maniscalco>	sorted lists are a good example of this	2010-01-24 22:31:08
	right	2010-01-24 22:31:18
	but the cost happens at the esacpe	2010-01-24 22:31:25
	i think	2010-01-24 22:31:27
	with a sorted list	2010-01-24 22:31:31
	there is no prediction after LF	2010-01-24 22:31:40
<Shelwien>	not really, and that's the problem too	2010-01-24 22:31:49
<maniscalco>	unless you have a human brain	2010-01-24 22:31:52
<Shelwien>	no, i mean there're not that much different letters	2010-01-24 22:32:04
<maniscalco>	wait, im wrong	2010-01-24 22:32:11
<Shelwien>	so after a few escapes	2010-01-24 22:32:11
<maniscalco>	the order 3 model would pick up on <LF>	2010-01-24 22:32:22
<Shelwien>	you'd have all the alphabet in the "=ing" context	2010-01-24 22:32:25
<maniscalco>	and could predict that the next symbol is similar to the last following <LF>	2010-01-24 22:32:36
<Shelwien>	but probability distribution would be completely wrong	2010-01-24 22:32:40
	...yeah, that's the problem	2010-01-24 22:33:03
	the context model _would_ predict that next symbol is similar to the last in context history	2010-01-24 22:33:21
<maniscalco>	i think that no model will be good at this	2010-01-24 22:33:22
	we only see it because we have many years of sampling in our heads	2010-01-24 22:33:34
<Shelwien>	but there're some	2010-01-24 22:33:36
<maniscalco>	we have good prediction	2010-01-24 22:33:42
	because we are keyed to find patterns	2010-01-24 22:33:51
	and language models would require the same advantage of experience to find the same patterns	2010-01-24 22:34:11
<Shelwien>	http://www.maximumcompression.com/data/dict.php	2010-01-24 22:34:12
<maniscalco>	what i dont quite understand here is how limited order BWT does so much better than full BWT	2010-01-24 22:35:45
<Shelwien>	yeah	2010-01-24 22:35:57
<maniscalco>	usually szip order 4 does much better on these types of data	2010-01-24 22:36:02
<Shelwien>	due to data specifics :)	2010-01-24 22:36:04
<maniscalco>	where it is like ppm	2010-01-24 22:36:09
	more than bwt	2010-01-24 22:36:11
*** pinc has joined the channel		2010-01-24 22:36:22
	but M03 is like PPM in that it is context based	2010-01-24 22:36:26
	and it doesn't get this advantage	2010-01-24 22:36:32
	(its not fair!)	2010-01-24 22:36:38
	(^:	2010-01-24 22:36:40
<Shelwien>	well, i think you can still detect it...	2010-01-24 22:36:49
<maniscalco>	obviously, because it makes full order predictions	2010-01-24 22:36:58
	sure	2010-01-24 22:37:08
	a simple filter can find and destroy	2010-01-24 22:37:16
	but that's not the point	2010-01-24 22:37:22
	adaptive models are what we want	2010-01-24 22:37:32
	not bandaids	2010-01-24 22:37:38
<Shelwien>	yeah	2010-01-24 22:37:57
<maniscalco>	which is why i dont have any particular respect for stuff like nanozip	2010-01-24 22:38:02
	big deal	2010-01-24 22:38:05
	you wrote some filters	2010-01-24 22:38:10
	modeling language is the trick	2010-01-24 22:38:26
	how do you adapt to localized syntax quickly	2010-01-24 22:38:44
	?	2010-01-24 22:38:45
	and here, ppm has the edge	2010-01-24 22:38:58
	and cm	2010-01-24 22:39:00
<Shelwien>	yeah	2010-01-24 22:39:10
<maniscalco>	because they are not block based	2010-01-24 22:39:13
	they adapt	2010-01-24 22:39:15
	maybe i could make a BWT like CM	2010-01-24 22:39:28
	with respect to lower order BWT	2010-01-24 22:39:37
	and full	2010-01-24 22:39:39
<Shelwien>	i think there're were ideas about weird BWT versions	2010-01-24 22:39:45
<maniscalco>	M03 isnt "weird" (^:	2010-01-24 22:40:02
<Shelwien>	like coding a BWT for block of N	2010-01-24 22:40:06
	and then coding a "patch" to BWT of block of 2*N	2010-01-24 22:40:24
	...or N+k	2010-01-24 22:40:31
<maniscalco>	well, certainly M03 would work for this	2010-01-24 22:40:41
	since it always knows what the current context is	2010-01-24 22:40:49
	but that would require more memory	2010-01-24 22:40:54
<Shelwien>	yeah	2010-01-24 22:41:02
<maniscalco>	and one of the main reasons for BWT is limited memory	2010-01-24 22:41:05
	so that is the point?	2010-01-24 22:41:09
<Shelwien>	speed too	2010-01-24 22:41:15
<maniscalco>	if you have to use more, use PPM/CM	2010-01-24 22:41:17
	sure	2010-01-24 22:41:20
	but PPM can do that	2010-01-24 22:41:24
	and speed and memory are becoming cheaper by the day	2010-01-24 22:41:49
<Shelwien>	well, PPM is kinda dead at this point :)	2010-01-24 22:41:56
<maniscalco>	bzzzt!!!!!	2010-01-24 22:42:12
<Shelwien>	all thats to Matt :)	2010-01-24 22:42:15
<maniscalco>	but thanks for playing the game	2010-01-24 22:42:22
	come on	2010-01-24 22:42:36
	shkarins work is still very strong	2010-01-24 22:42:47
<Shelwien>	i mean, we don't have anything beside ppmd	2010-01-24 22:42:57
<maniscalco>	and fast compared to cm	2010-01-24 22:43:00
	why do you need more ?	2010-01-24 22:43:18
<Shelwien>	because there better CMs now	2010-01-24 22:43:31
<maniscalco>	as we say in the states he "knocked it out of the park"	2010-01-24 22:43:35
	still slow	2010-01-24 22:43:40
<Shelwien>	no	2010-01-24 22:43:46
	ccm is faster than ppmd	2010-01-24 22:43:50
	and has better overall compression	2010-01-24 22:44:01
<maniscalco>	really, i would have to look	2010-01-24 22:44:10
	and not at a single file bench mark like large text file	2010-01-24 22:44:25
	that's an easy file to compress	2010-01-24 22:44:32
<Shelwien>	well, considering texts, ccm is relatively bad	2010-01-24 22:44:46
	like 220k on book1	2010-01-24 22:44:51
<maniscalco>	that's the only way i have seen it compared, i confess	2010-01-24 22:45:08
<Shelwien>	but its really good on binaries	2010-01-24 22:45:09
<maniscalco>	really	2010-01-24 22:45:15
	i will look into that	2010-01-24 22:45:19
	that is a problem for BWT	2010-01-24 22:45:30
<Shelwien>	well, there're reasons for that :)	2010-01-24 22:45:35
<maniscalco>	again, short contexts	2010-01-24 22:45:39
	BWT is a great big thing, taken down by small things	2010-01-24 22:45:56
	which is where CM comes in	2010-01-24 22:46:16
	... pick you battles wisely ...	2010-01-24 22:46:32
	got to go for a bit. cat is hungry	2010-01-24 22:47:12
<Shelwien>	:)	2010-01-24 22:47:16
*** maniscalco has left the channel		2010-01-24 22:47:26
*** pinc has left the channel		2010-01-24 23:11:30
*** mike_____ has left the channel		2010-01-24 23:57:38
*** Shelwien has left the channel		2010-01-25 01:18:18
*** Shelwien has joined the channel		2010-01-25 01:32:32
*** STalKer-Y has joined the channel		2010-01-25 04:19:56
*** STalKer-X has left the channel		2010-01-25 04:21:07
*** scott___ has left the channel		2010-01-25 05:55:48
*** pinc has joined the channel		2010-01-25 08:01:44
*** Shelwien has left the channel		2010-01-25 10:01:47
*** Guest9968193 has joined the channel		2010-01-25 10:01:50
*** mondragon has joined the channel		2010-01-25 11:17:52
*** mondragon has left the channel		2010-01-25 11:23:59
*** pinc has left the channel		2010-01-25 12:26:55
*** pinc has joined the channel		2010-01-25 12:27:03
*** pinc has left the channel		2010-01-25 12:27:26
*** pinc has joined the channel		2010-01-25 12:27:36
*** pmcontext has joined the channel		2010-01-25 15:24:29
<pmcontext>	hi ;)	2010-01-25 15:26:16
	finally i think i got the mixer working	2010-01-25 15:28:47
	yesterday it expanded files	2010-01-25 15:29:11
	today its compressing	2010-01-25 15:29:17
	i used o0 + o1 to test it	2010-01-25 15:30:13
	and the w is [0,64] , is this ok ? or should it have more range	2010-01-25 15:30:51
	currently on book1 it is 358591 , i guess it need more tuning	2010-01-25 15:34:48
	ok after little tuning book1 346224	2010-01-25 15:40:24
*** mike_____ has joined the channel		2010-01-25 15:44:54
	strangly , dw = (( dperror(b) ) / (p(ONE-p)) ); gives me bad compression	2010-01-25 15:51:33
	compared to dw = (( dp*error(b) )>>L)	2010-01-25 15:51:46
	i am running out of tuning ideas , darn 346224	2010-01-25 15:56:56
*** scott___ has joined the channel		2010-01-25 16:10:03
<scott___>	hi	2010-01-25 16:11:10
*** mike_____ has left the channel		2010-01-25 16:26:07
*** toffer has joined the channel		2010-01-25 16:48:56
<toffer>	hi	2010-01-25 16:49:04
<pmcontext>	hi	2010-01-25 16:57:41
	at the moment i use dw = (( dp*error(b) )>>L) ;	2010-01-25 16:58:09
	there was a small bug , now it is 351888	2010-01-25 17:00:12
*** pinc has left the channel		2010-01-25 17:00:36
	increased range of w , now 350457	2010-01-25 17:05:06
	the mixer seems to be working ,	2010-01-25 17:11:44
	using only o0 + o1	2010-01-25 17:12:00
<toffer>	your update formula is still wrong...	2010-01-25 17:12:20
<pmcontext>	but its not giving good result yet i guess	2010-01-25 17:12:31
<toffer>	what's plain o1 w/o mixing?	2010-01-25 17:13:36
	and what is L	2010-01-25 17:13:42
	and you need to use more precision for the weights	2010-01-25 17:13:58
<pmcontext>	o1 + o0 mix	2010-01-25 17:14:00
	L = 17 . and w [0 , 512]	2010-01-25 17:14:31
<toffer>	thus w=9 bit	2010-01-25 17:14:39
	and probs are?	2010-01-25 17:14:44
	16 bit?	2010-01-25 17:14:48
<pmcontext>	12 bit	2010-01-25 17:14:54
<toffer>	as i said	2010-01-25 17:14:58
	you can increase	2010-01-25 17:15:00
	to 16	2010-01-25 17:15:02
<pmcontext>	ok i will do now	2010-01-25 17:15:05
<toffer>	and weights to 12	2010-01-25 17:15:08
	erm	2010-01-25 17:15:10
	15	2010-01-25 17:15:12
	thus you have	2010-01-25 17:15:22
	dp*error(b) = 24 bit, signed	2010-01-25 17:15:31
<pmcontext>	increasing w to 12 bit i can	2010-01-25 17:16:08
	but im not sure how to change p	2010-01-25 17:16:17
<toffer>	and 24-17 = 7 bit	2010-01-25 17:16:25
<pmcontext>	the probs are stored as n0 , n1	2010-01-25 17:16:26
<toffer>	...	2010-01-25 17:16:43
	these are counts	2010-01-25 17:16:46
	not probs	2010-01-25 17:16:48
	N bit precision	2010-01-25 17:16:55
	P(y=1) = (n1<<N)/(n0+n1)	2010-01-25 17:17:08
<pmcontext>	dam ok i got it	2010-01-25 17:17:11
	i forgot about that	2010-01-25 17:17:19
<toffer>	i'm away for dinner	2010-01-25 17:17:43
<pmcontext>	ok	2010-01-25 17:19:21
	with w increased to 12 bit 349123	2010-01-25 17:20:16
	after p is scaled to 16 bit , since my coder needs 12 bit , should i p>>4 ?	2010-01-25 17:23:41
	348825	2010-01-25 17:32:31
	working on increasing p	2010-01-25 17:36:46
	after i try to increase p 365693, may be i did something bad	2010-01-25 17:40:17
	back to 12 bit and 348530	2010-01-25 17:49:55
	w is 12 bit , p is 12 bit now	2010-01-25 17:50:16
	p 16 bit 365759	2010-01-25 17:58:13
	p 15 bit 355858	2010-01-25 17:58:14
	p 14 bit 349491	2010-01-25 17:58:16
	p 12 bit 348530	2010-01-25 17:58:17
*** scott___ has left the channel		2010-01-25 18:21:46
<toffer>	higher precision give better compression	2010-01-25 18:28:58
	the worse compression is due to overflows	2010-01-25 18:29:08
	for sure	2010-01-25 18:29:12
	*gives better ...	2010-01-25 18:29:21
	set weight precision to 15 bit	2010-01-25 18:30:05
<pmcontext>	ok	2010-01-25 18:30:10
<toffer>	and probability precision to 16 bit	2010-01-25 18:30:18
	in some old m1 i used 20 bit probs	2010-01-25 18:30:28
	overflows can happen like that:	2010-01-25 18:30:44
	(p2-p1) w + p1	2010-01-25 18:30:53
	p2-p1 will be 16 bit, signed -> thus 17 bit	2010-01-25 18:31:04
	and (p2-p1) w will be 16bit, 1 bit sign + 15 bit (weight) = 32 bit	2010-01-25 18:31:35
<pmcontext>	w 15 bit 346455	2010-01-25 18:31:40
<toffer>	so check for overflows in weighting and update	2010-01-25 18:31:55
	you can post the implementation, too	2010-01-25 18:32:04
<pmcontext>	im sure the w is has no overflow problem	2010-01-25 18:32:17
	but not sure when i change p	2010-01-25 18:32:42
	how do i scale the P after mixing for my encoder , it needs 12 bit p	2010-01-25 18:35:16
	if p is now 16 bit . after mixing p = ? bit , and encoder need p = 12 bit	2010-01-25 18:35:59
<toffer>	your mixing equation, please	2010-01-25 18:47:10
<pmcontext>	dp=p2-p1;	2010-01-25 18:47:42
	p = p1 + ((dp*w)>>W_BIT);	2010-01-25 18:47:44
	W_BIT=15;	2010-01-25 18:48:01
	int mix(int p1, int p2){	2010-01-25 18:48:48
	dp=p2-p1;	2010-01-25 18:48:50
	p = p1 + ((dp*w)>>W_BIT);	2010-01-25 18:48:51
<toffer>	w is initialized to 2^(W_BIT-1)	2010-01-25 18:48:53
<pmcontext>	return p;	2010-01-25 18:48:53
	}	2010-01-25 18:48:54
<toffer>	?	2010-01-25 18:49:17
	and the mixer update?	2010-01-25 18:49:35
<pmcontext>	----- about w	2010-01-25 18:49:48
	W_BIT=15;	2010-01-25 18:49:50
	W_ONE=(1<<W_BIT)	2010-01-25 18:49:51
<toffer>	looks ok precisionwise	2010-01-25 18:49:52
<pmcontext>	w=W_ONE/2;	2010-01-25 18:49:53
	----------- update	2010-01-25 18:50:30
	void update(int b){	2010-01-25 18:50:31
	if(p>T && p<(ONE-T)){	2010-01-25 18:50:33
	int dw = (( dp*error(b) )>>L) ;	2010-01-25 18:50:35
	w = max( 0 , min( w+dw , W_ONE ) );	2010-01-25 18:50:36
	}	2010-01-25 18:50:38
	}	2010-01-25 18:50:39
<toffer>	i can tell you what blows higher precision p up	2010-01-25 18:51:25
<pmcontext>	-------------------------- const	2010-01-25 18:51:40
	T=1;	2010-01-25 18:51:42
	PRANGE=12;	2010-01-25 18:51:44
	ONE=(1<<PRANGE);	2010-01-25 18:51:45
	W_BIT=15;	2010-01-25 18:51:47
	W_ONE=(1<<W_BIT);	2010-01-25 18:51:48
	o.o anything blow ?	2010-01-25 18:52:21
<toffer>	think about it yourself	2010-01-25 18:54:06
	dp*error(b)	2010-01-25 18:54:13
	with 32 bit precision	2010-01-25 18:54:17
<pmcontext>	L=14 currently	2010-01-25 18:55:10
	i tried 16 bit for p but it seem to 365778 , with 12 bit it giving346455	2010-01-25 18:55:11
<toffer>	just think about it	2010-01-25 18:56:26
	dp = p2-p1 -> 16 bit with 1 bit sign	2010-01-25 18:56:37
<pmcontext>	and error is ((b<<PRANGE)-p)	2010-01-25 18:57:31
	where PRANGE=12;	2010-01-25 18:57:33
	so it be 12 bit ?	2010-01-25 18:57:35
<toffer>	you don't understand, i guess	2010-01-25 18:57:58
	w8	2010-01-25 18:58:01
	phoen	2010-01-25 18:58:03
	phone	2010-01-25 18:58:04
	if p1 and p2 have 16 bit precision	2010-01-25 19:16:17
	how much bits does dp=p2-p1 need?	2010-01-25 19:16:27
<pmcontext>	16 bit ?	2010-01-25 19:16:36
<toffer>	0-2^16 =	2010-01-25 19:16:51
<pmcontext>	hm it can also be negtive	2010-01-25 19:17:11
<toffer>	-2^16, thus you get a sign bit	2010-01-25 19:17:13
	alltogether 17 bit	2010-01-25 19:17:21
<pmcontext>	so it need 17 bit	2010-01-25 19:17:23
<toffer>	same for error	2010-01-25 19:17:30
	thus	2010-01-25 19:17:32
	dp*error = 34 bit	2010-01-25 19:17:38
	overflow!	2010-01-25 19:17:43
	same for 15 bit	2010-01-25 19:17:50
	but ok for 14 -> 30 bit alltogether	2010-01-25 19:17:58
	you see it's just an overflow issue	2010-01-25 19:18:03
<pmcontext>	o.o oh	2010-01-25 19:18:27
	34 bit is more then my int	2010-01-25 19:18:43
<toffer>	that's why you get worse results with precision higher than 14	2010-01-25 19:18:44
	my advise	2010-01-25 19:19:59
	use 16 bit for p	2010-01-25 19:20:03
	and calculate it like that	2010-01-25 19:20:14
	the most you can get is 31 bit with a sign bit	2010-01-25 19:21:17
<pmcontext>	do u think this is bad	2010-01-25 19:21:57
	in mixing	2010-01-25 19:21:59
	p = p1 + ((dp*w)>>W_BIT);	2010-01-25 19:22:01
	dp = 17 bit , w = 15 bit	2010-01-25 19:22:02
	so it is 32 bit before i shift with W_BIT;;	2010-01-25 19:22:04
<toffer>	mixing is ok	2010-01-25 19:22:11
	but the update goes wrong	2010-01-25 19:22:21
	(dp+1>>1)*(error+1>>1)	2010-01-25 19:22:23
	or (dp+2>>2)*error	2010-01-25 19:22:39
	you still need to add g(p)	2010-01-25 19:22:47
<pmcontext>	but when i added g(p) it gave bad result	2010-01-25 19:23:07
<toffer>	because you implement it in a wrong fashion	2010-01-25 19:23:31
<pmcontext>	i used g(p) = 1/(p*(ONE-p))	2010-01-25 19:23:57
<toffer>	and again	2010-01-25 19:24:07
	how much bits of precision do you need for dpeg(p)	2010-01-25 19:24:23
	you got 32	2010-01-25 19:24:25
	and if you used 12 bit	2010-01-25 19:24:30
	you get	2010-01-25 19:24:34
	36 bit alltogether...	2010-01-25 19:24:44
	plus a sign bit	2010-01-25 19:24:49
<pmcontext>	oh o.o	2010-01-25 19:24:50
<toffer>	and	2010-01-25 19:24:55
	your threshold T=1	2010-01-25 19:25:00
	is far too less	2010-01-25 19:25:04
<pmcontext>	T=2?	2010-01-25 19:25:13
<toffer>	...	2010-01-25 19:25:17
	you got 15 bit weights	2010-01-25 19:25:27
	i guess something like 2^5 or ...2^8 is reasonable	2010-01-25 19:25:43
	you have to try	2010-01-25 19:25:52
<pmcontext>	oh ok	2010-01-25 19:25:58
<toffer>	as i said	2010-01-25 19:26:08
	first implement it properly instead of using such improper fixes	2010-01-25 19:26:20
	than	2010-01-25 19:26:27
	you can plot 1/(p(1-p)) and see if it can be approximated with 1	2010-01-25 19:26:44
	what you did	2010-01-25 19:26:48
<pmcontext>	ok i will add g(p) and fix the precision	2010-01-25 19:27:48
<toffer>	ok	2010-01-25 19:28:17
	good luck	2010-01-25 19:28:39
	i gonna be here some more time	2010-01-25 19:28:44
	if you got any questions	2010-01-25 19:28:47
<pmcontext>	ok thank you, no question yet	2010-01-25 19:30:22
<toffer>	and keep in mind that (15 bit mantissa, 1 bit sign) * (15 bit mantissa, 1 bit sign) = 30 bit mantissa, 1 bit sign (not 2)	2010-01-25 19:32:01
<pmcontext>	ok 30 bit and 1 sign bit	2010-01-25 19:32:40
	oh and when the p is 16 bit , i do p>>4 to get 12 bit for encoder ?	2010-01-25 19:35:56
<toffer>	yes	2010-01-25 19:36:28
	but you can use higher precisions, too	2010-01-25 19:36:40
	which helps	2010-01-25 19:36:44
	but first make the calculations work	2010-01-25 19:36:49
<pmcontext>	ok	2010-01-25 19:36:54
<toffer>	and compare to order1 only	2010-01-25 19:39:01
<pmcontext>	dw = (( dperror(b) ) / (p(ONE-p)) ) >> L	2010-01-25 19:45:23
	is this look ok ? i mean the formula	2010-01-25 19:45:25
	im still working on precision	2010-01-25 19:45:26
<toffer>	dp is?	2010-01-25 19:48:21
	p2-p1?	2010-01-25 19:48:38
<pmcontext>	dp is p2 - p1	2010-01-25 19:48:44
<toffer>	ok	2010-01-25 19:48:48
	error is?	2010-01-25 19:48:52
<pmcontext>	((b<<PRANGE)-p)	2010-01-25 19:49:02
<toffer>	your terminology confuses me a bit	2010-01-25 19:49:13
	it'd be better if you make it a bit clearer	2010-01-25 19:49:26
	like P_BITS, P_ONE=1<<P_BITS	2010-01-25 19:49:37
	and so one	2010-01-25 19:49:39
	W_BITS, ...	2010-01-25 19:49:42
	ONE is?	2010-01-25 19:49:47
<pmcontext>	b is bit	2010-01-25 19:49:54
	prange , precision of P	2010-01-25 19:49:55
	and p is prediction we got after mix	2010-01-25 19:49:57
	PRANGE=14;	2010-01-25 19:50:10
<toffer>	i mean ONE is 1<<PRANGE?	2010-01-25 19:50:10
<pmcontext>	ONE=(1<<PRANGE);	2010-01-25 19:50:11
<toffer>	ok than	2010-01-25 19:50:14
<pmcontext>	yes	2010-01-25 19:50:16
<toffer>	so T is	2010-01-25 19:50:17
<pmcontext>	1<<5	2010-01-25 19:50:26
	T = 32	2010-01-25 19:50:45
<toffer>	looks ok	2010-01-25 19:50:49
	and did you made some sign checks	2010-01-25 19:50:58
	i.e. not w - dw	2010-01-25 19:51:08
<pmcontext>	w + dw	2010-01-25 19:51:19
<toffer>	did we define dw = dH/dw or dw=-dH/dw	2010-01-25 19:51:48
	it hast to read w - dH/dw for minimization	2010-01-25 19:52:27
	so check your signs	2010-01-25 19:53:21
<pmcontext>	i duno but i rember this	2010-01-25 19:54:06
	dw = (p2-p1) * (y-p) * g(p) , its value is positive if i get y=1 and p2 is greater,	2010-01-25 19:54:08
	that means w should increase to give more weight to p2 model	2010-01-25 19:54:09
	then i have to add , w + dw	2010-01-25 19:54:11
<toffer>	i checked it again	2010-01-25 19:58:34
	dH/dw = - dp/dw (y-p) 1/(p(1-p))	2010-01-25 19:58:53
	and dw = -dH/dw	2010-01-25 19:59:02
	thus it's corect	2010-01-25 19:59:08
<pmcontext>	ok i pass :D	2010-01-25 19:59:27
<toffer>	H = -( y ln p + (1-y) ln (1-p) -> dH/dw = dp/dw (y-p) 1/(p(1-p)) and dp/dw = (p2-p1), iff p=(p2-p1)w + p1	2010-01-25 20:00:31
	H = -( y ln p + (1-y) ln (1-p) -> dH/dw = - dp/dw (y-p) 1/(p(1-p)) and dp/dw = (p2-p1), iff p=(p2-p1)w + p1	2010-01-25 20:00:49
	anyway	2010-01-25 20:00:53
	your formula is correct	2010-01-25 20:00:57
	w' = w - L dH/dw = w - L( - dp/dw (y-p) 1/(p(1-p)) )	2010-01-25 20:01:21
<pmcontext>	w' = w + L dp/dw (y-p) 1/(p(1-p)) :D	2010-01-25 20:02:50
	fixing presicion is hard , may be i increase one bit at a time	2010-01-25 20:03:42
	when p is 13 bit , it improved compression	2010-01-25 20:04:06
	when i put 14 bit it went lil bad	2010-01-25 20:04:15
	p 13 bit 365151	2010-01-25 20:06:27
<toffer>	no wonder...	2010-01-25 20:06:52
	your update formula again,please	2010-01-25 20:07:10
<pmcontext>	int dw = (( dperror(b) ) / (p(ONE-p)) );	2010-01-25 20:07:22
	w = max( 0 , min( w+dw , W_ONE ) );	2010-01-25 20:07:42
<toffer>	p(1-p) has 26 bit precision	2010-01-25 20:08:21
<pmcontext>	yes	2010-01-25 20:08:34
	but isnt it 27 bit ?	2010-01-25 20:08:50
<toffer>	no	2010-01-25 20:08:53
	it's always positive	2010-01-25 20:08:58
	and dp*error(b) has 27 bit	2010-01-25 20:09:03
<pmcontext>	oh yes	2010-01-25 20:09:03
<toffer>	26 bit mantissa	2010-01-25 20:09:11
	thus 26bit - 26 bit = how many bit of adjustment?	2010-01-25 20:09:30
	you see	2010-01-25 20:09:37
<pmcontext>	0 bit o.o ?	2010-01-25 20:09:46
<toffer>	right	2010-01-25 20:09:48
	maybe due to roundin 1 bit	2010-01-25 20:09:54
	rounding	2010-01-25 20:09:57
<pmcontext>	yes could be	2010-01-25 20:10:10
<toffer>	as you remember	2010-01-25 20:10:17
	T = 1<<5	2010-01-25 20:10:22
<pmcontext>	yes	2010-01-25 20:10:27
<toffer>	thus p(1-p) is guaranteed to have 10 bit, at least	2010-01-25 20:10:35
	right?	2010-01-25 20:10:37
<pmcontext>	yes	2010-01-25 20:10:45
<toffer>	si you can scale it down a bit	2010-01-25 20:10:58
	so	2010-01-25 20:11:00
<pmcontext>	dw >> L ?	2010-01-25 20:11:25
<toffer>	... / ( (p*(ONE-p)) >> 8 )	2010-01-25 20:11:26
	no	2010-01-25 20:11:29
<pmcontext>	oh	2010-01-25 20:11:29
<toffer>	proper rounding can be important, too	2010-01-25 20:12:00
	(... + (1<<7)) >> 8	2010-01-25 20:12:11
<pmcontext>	ah i see	2010-01-25 20:12:11
<Shelwien>	hi	2010-01-25 20:12:58
	also i think its better to use divisions in experimental formulas	2010-01-25 20:13:39
	like <pmcontext> p = p1 + ((dp*w)>>W_BIT);	2010-01-25 20:14:01
	with negative dp it would become -1 max	2010-01-25 20:14:34
<pmcontext>	hi shelwien	2010-01-25 20:15:04
<Shelwien>	and 0 with positive, which is asymmetric	2010-01-25 20:15:08
<pmcontext>	OMG awsome i did p*(ONE-p)) >> 8 and suddenly it is at 346355, came down from 365k	2010-01-25 20:15:23
	((p*(ONE-p)) >> 8)	2010-01-25 20:15:49
	p*(ONE-p)) >> (7+1)	2010-01-25 20:16:07
	sorry bad typing	2010-01-25 20:16:30
<toffer>	hi eugene	2010-01-25 20:17:51
	i got o1 decomposition working	2010-01-25 20:17:56
	with proper decompression	2010-01-25 20:18:04
	speed increased by 700kb/s from 1.8mb/s to 2.5mb/s	2010-01-25 20:18:15
<Shelwien>	good i guess... though did I benchmark that before? guess not...	2010-01-25 20:19:20
<toffer>	nope	2010-01-25 20:19:30
	compression on big files improved	2010-01-25 20:19:37
	the o1 decomposition virtually replaces an o1 moel	2010-01-25 20:19:44
	model	2010-01-25 20:19:46
	but on short files saving the codelength hurts	2010-01-25 20:19:56
	i'm working on that	2010-01-25 20:20:01
<Shelwien>	maybe disable it for short files? :)	2010-01-25 20:20:17
<toffer>	i can still drop to o0	2010-01-25 20:22:28
	or to flat decomposition	2010-01-25 20:22:34
	currently i'm just storing the code lengths	2010-01-25 20:22:51
*** pinc has joined the channel		2010-01-25 20:22:54
	i experimented with masked decompositions	2010-01-25 20:23:19
	0x40df gives 0.1 bpc better "decomposition compression" for text on e7/e8	2010-01-25 20:23:45
<Shelwien>	err... like partial o1 or what?	2010-01-25 20:23:46
	ah	2010-01-25 20:23:56
<toffer>	the speedup is about 2..3%	2010-01-25 20:23:59
	compared to plain o1	2010-01-25 20:24:03
	on the other hand	2010-01-25 20:24:05
	o1 is better elsewhere	2010-01-25 20:24:16
<Shelwien>	well, surely its more general	2010-01-25 20:24:53
	there're russian (and finnish) texts in my benchmark, so 40DF might not be that good...	2010-01-25 20:25:16
<toffer>	atm i'd just keep o1	2010-01-25 20:25:34
	and maybe change the coder structure to add a parameter for that	2010-01-25 20:25:45
	currently the decomposition is constructed outside of the actual compressor	2010-01-25 20:26:00
	as to speedup	2010-01-25 20:26:05
<Shelwien>	"Francesco used Huffman coding before arithmetic coding is his Rings compressor. That increased both speed and strength of compression."	2010-01-25 20:26:28
<toffer>	plain: 1.8mb/s, o0: 2.2mb/s, o1: 2.5,b/s	2010-01-25 20:26:30
<Shelwien>	did you see that? :)	2010-01-25 20:26:32
<toffer>	i read it a while ago	2010-01-25 20:26:43
	code greatly simplified,too	2010-01-25 20:28:36
	since the previous nibble caching is highly distorted with variable length codes	2010-01-25 20:28:52
	the hit rate dropped fromo 50..70% to 10%	2010-01-25 20:29:05
	i abandoned the 1% speedup	2010-01-25 20:29:18
	which was left from that	2010-01-25 20:29:23
<Shelwien>	btw, maybe o1 decomposition can help with hashing too?	2010-01-25 20:29:37
<toffer>	virtually the hash table is twice as large	2010-01-25 20:29:49
	since it stores 3.92 bits per char	2010-01-25 20:30:00
	for e7	2010-01-25 20:30:02
	instead of 8	2010-01-25 20:30:05
	bpc	2010-01-25 20:30:19
	with just 32 mb i can already compress e8 to 21 4xx xxx	2010-01-25 20:32:00
	previously i needed something like 100mb for that	2010-01-25 20:32:15
	and o1 helps with hashing greatly, since almost every symbol is coded with just a single random memory access	2010-01-25 20:32:59
<Shelwien>	well, why don't we just benchmark it?	2010-01-25 20:35:13
	or does it require more tuning first?	2010-01-25 20:35:28
<toffer>	it's untuned	2010-01-25 20:35:33
	and i first want to make code lengths compressed	2010-01-25 20:35:44
<Shelwien>	err... do you store them uncompressed now? %)	2010-01-25 20:36:06
<toffer>	at the moment they require 256*128 bit	2010-01-25 20:36:09
	as i said	2010-01-25 20:36:13
	i just worked on the algoritihms	2010-01-25 20:36:18
	yes	2010-01-25 20:36:27
	algorithms	2010-01-25 20:36:32
<Shelwien>	well, for me some o0 coding is usually easier that bit packing :)	2010-01-25 20:37:07
	because i already have rc and counter classes anyway, but no bit i/o :)	2010-01-25 20:37:37
<toffer>	i know	2010-01-25 20:38:04
	i didn't want to do something like that	2010-01-25 20:38:12
	i wanted to store lengths hierarchically at least	2010-01-25 20:38:31
	and i guess i'd use ordinary stationary counters there	2010-01-25 20:39:16
<Shelwien>	well, i wonder whether a binary tree would be better there	2010-01-25 20:39:56
	or some coding of length array	2010-01-25 20:40:03
	as to that, jpeg has some kinda funny lengthtable coding	2010-01-25 20:40:42
	they sorted all the lengths there, from 0 to max	2010-01-25 20:41:12
	and for each length a number of symbols encoded instead (runlength)	2010-01-25 20:41:52
	of course, 0 if length is not used too	2010-01-25 20:42:10
	nice idea imho, but implies an alphabet permutation	2010-01-25 20:42:56
<pmcontext>	i g2g and thank you	2010-01-25 21:12:53
*** pmcontext has left the channel		2010-01-25 21:13:11
<Shelwien>	still, i wonder what would be the effect	2010-01-25 21:35:00
	on a file with random blocks in it	2010-01-25 21:35:40
<toffer>	somehow it badly works	2010-01-25 21:38:48
	guess i gonna do some basic o1 coding	2010-01-25 21:38:57
	or something like that	2010-01-25 21:39:00
<Shelwien>	no, i mean how it would affect your o1 decomposition	2010-01-25 21:39:19
<toffer>	but i just found that directly skipping deterministic o1 contexts improves speed	2010-01-25 21:39:29
	no wonder	2010-01-25 21:39:43
	but i didn't expect o1 contexts to have determinism	2010-01-25 21:39:53
<Shelwien>	even o0 can :)	2010-01-25 21:40:13
<toffer>	very unlikely	2010-01-25 21:40:43
	but yes	2010-01-25 21:40:45
<Shelwien>	anyway, i was thinking about blockwise-adaptive decomposition	2010-01-25 21:41:51
	like, you can store freqs for some kinda blocks	2010-01-25 21:42:55
<toffer>	first i need to get storage working good enough	2010-01-25 21:42:57
<Shelwien>	...and then reencode the contexts on access or something	2010-01-25 21:43:34
	...guess that won't make much sense because of speed issues	2010-01-25 21:44:05
	which means that you'd need some segmentation instead	2010-01-25 21:44:31
<toffer>	anyway for me the priority is like that	2010-01-25 21:44:49
	1. get things woring	2010-01-25 21:44:52
	working	2010-01-25 21:44:57
	2. make them pretty	2010-01-25 21:44:59
<Shelwien>	...and a match model unaffected by segmentation	2010-01-25 21:45:01
<toffer>	i could post a list of code lengths	2010-01-25 21:46:19
	and see if anybody has some good ideas	2010-01-25 21:46:28
	but well...	2010-01-25 21:46:33
<Shelwien>	you can do that, but any feedback is unlikely :)	2010-01-25 21:46:50
<toffer>	yep	2010-01-25 21:47:13
	guess i gonna do some weight training now	2010-01-25 21:47:56
	and i think the compressor for code lengths will be more difficult than the main moel	2010-01-25 21:48:08
	model	2010-01-25 21:48:10
	^^	2010-01-25 21:48:12
<Shelwien>	"weight training" always sounds like running the optimizer to me :)	2010-01-25 21:48:41
<toffer>	well maybe it's flushing the optimizer between the ears	2010-01-25 21:49:07
	^^	2010-01-25 21:49:11
<Shelwien>	and as to codelengths... it could be more troublesome if you really had to encode blockwise tables	2010-01-25 21:49:44
	but for a single table there's too liitle stats to make anything complex	2010-01-25 21:50:16
<toffer>	there's much correlation betwenn adjacent contexts	2010-01-25 21:50:18
	thus i gonna take care of it	2010-01-25 21:50:30
<Shelwien>	err... don't forget about symmetry :)	2010-01-25 21:51:01
	i mean, number of XY context affects the number of YZ contexts etc :)	2010-01-25 21:51:29
	for o1 it should be still manageable... but i really got stuck with high-order context in ctx :)	2010-01-25 21:52:11
	btw toffer	2010-01-25 22:35:28
	i've got a new interesting testfile :)	2010-01-25 22:35:42
	maybe two, even	2010-01-25 22:35:48
	there's a 40M exe from microsoft sql server	2010-01-25 22:36:09
	and a 70M pdb file for it :)	2010-01-25 22:37:41
	unlike acrord32 etc, there're no images or deflate streams, just lots of various tables and x86 code :)	2010-01-25 22:38:53
*** pinc has left the channel		2010-01-25 23:03:38
<toffer>	good leg training with just dumbbells is really annoying	2010-01-25 23:17:41
	but the optimzier flush seemed to work	2010-01-25 23:18:24
<Shelwien>	any new ideas?	2010-01-25 23:18:58
<toffer>	well some good contexts for storing code lengths	2010-01-25 23:19:57
	even if i'd just compress o1 stats to 1/4 of their original size	2010-01-25 23:20:13
	it'd pay off compression-wise already	2010-01-25 23:20:19
<Shelwien>	well, it'd still be redundant	2010-01-25 23:20:46
<toffer>	i can upload a sample	2010-01-25 23:21:06
<Shelwien>	if you won't use that info somehow to reduce the codelength of actual data	2010-01-25 23:21:07
<toffer>	if you want	2010-01-25 23:21:08
<Shelwien>	well, why not	2010-01-25 23:23:51
	did you try compressing them with paq8 for redundancy estimation btw?	2010-01-25 23:24:11
<toffer>	no	2010-01-25 23:24:15
	it's in the /dcc folder	2010-01-25 23:27:23
<Shelwien>	oh, you remember where it is now? :)	2010-01-25 23:27:49
<toffer>	as i said	2010-01-25 23:27:57
	it is in the log of my univeristies pc	2010-01-25 23:28:04
	(explorer	2010-01-25 23:28:09
	)	2010-01-25 23:28:11
<Shelwien>	now, what about uploading it again in the form of 64k byte table? :)	2010-01-25 23:29:15
<toffer>	i usually use texts since i make some plots with octave	2010-01-25 23:30:57
	w8	2010-01-25 23:30:58
	i guess you have a hex editor	2010-01-25 23:31:28
	any idea why that doesn't work	2010-01-25 23:38:50
	putc('a'+tmp[i], to);	2010-01-25 23:38:54
	putc('a'+tmp[i+1], to);	2010-01-25 23:38:55
	// printf( "%2u %2u ", tmp[i], tmp[i+1] );	2010-01-25 23:38:57
	// if (((i+1)&31)==31) { putchar('\n'); }	2010-01-25 23:38:58
	it should output a, b, ...	2010-01-25 23:39:06
	but it doesn't	2010-01-25 23:39:11
<Shelwien>	what's the file mode?	2010-01-25 23:39:48
<toffer>	well finally	2010-01-25 23:41:08
	it was just written to the compressed file	2010-01-25 23:41:16
	instead of stdout	2010-01-25 23:41:20
	it's there now	2010-01-25 23:42:32
<Shelwien>	ok... meanwhile i nearly finished a perl script to do the same :)	2010-01-25 23:43:03
<toffer>	that was just a pretty dumb mistake	2010-01-25 23:43:30
	reminds me of reimplementing hashing	2010-01-25 23:43:43
	and comparing u8 to u16	2010-01-25 23:43:47
	wondering why the comparsion failed almost always	2010-01-25 23:43:57
	^^	2010-01-25 23:43:59
<Shelwien>	well, i can't see any problem in your quote :)	2010-01-25 23:44:41
<toffer>	to is the destination file	2010-01-25 23:44:52
	not stdout	2010-01-25 23:44:57
<Shelwien>	sure, but isn't that what you posted?	2010-01-25 23:46:01
<toffer>	i was dumping it like that	2010-01-25 23:46:48
	m1 3 ..\..\enwik7.txt \testset\enwik\enwik7 nul > enwik7_len.bin	2010-01-25 23:46:54
	even with just o0 coding	2010-01-25 23:47:22
	that stuff can be compressed alot	2010-01-25 23:47:30
	guess some simple model will be sufficient	2010-01-25 23:47:58
<Shelwien>	undef $/;	2010-01-25 23:51:46
	open I, "<enwik7_len.txt";	2010-01-25 23:51:47
	binmode I;	2010-01-25 23:51:47
	$a = <I>;	2010-01-25 23:51:47
	close I;	2010-01-25 23:51:47
	$a =~ s/[\x00-\x20]+$//;	2010-01-25 23:51:47
	$a =~ s/^[\x00-\x20]+//;	2010-01-25 23:51:48
	@b = split /[\x00-\x20]+/, $a;	2010-01-25 23:51:50
	$c = join "", map chr, @b;	2010-01-25 23:51:53
	open O, ">enwik7_len0.bin";	2010-01-25 23:51:54
	binmode O;	2010-01-25 23:51:57
	print O $c;	2010-01-25 23:51:58
	close O;	2010-01-25 23:52:00
<toffer>	i hate perl	2010-01-25 23:54:18
	as you might remember	2010-01-25 23:54:24
	i tried to learn it for code generation	2010-01-25 23:54:32
<Shelwien>	well, i don't really understand it myself	2010-01-25 23:54:56
	also i had to use some windows perl version here	2010-01-25 23:55:28
	which required these "binmode" lines because of LF/CRLF stuff	2010-01-25 23:55:49
	but still i don't know anything more efficient for text manipulation	2010-01-25 23:58:42
<toffer>	afair it was intended exactly for that	2010-01-25 23:59:47
<Shelwien>	$c = join "", map chr($_+ord('a')), @b;	2010-01-25 23:59:54
	for example, this produces exactly the same file like you uploaded	2010-01-26 00:00:15
*** scott___ has joined the channel		2010-01-26 00:05:37
	anyway, somehow your version with +'a' really results in better compression with paq	2010-01-26 00:05:46
	6634 vs 6655 using paq8p -7	2010-01-26 00:06:10
	6291 vs 6334 using paq8p -7 and with added grayscale bmp headers	2010-01-26 00:06:36
<toffer>	so that's already 1/5	2010-01-26 00:08:09
	i guess reaching 1/4 with specialised compression will be possible	2010-01-26 00:08:50
<Shelwien>	actually paq is usually fairly bad at table compression	2010-01-26 00:16:16
	so a tuned simple specialized coder should do better than paq	2010-01-26 00:16:39
*** scott___ has left the channel		2010-01-26 00:25:13
<toffer>	hm just storing it hierarchically	2010-01-26 00:26:16
	already gets it down to 10k	2010-01-26 00:26:20
<Shelwien>	note that paq just sequentially compresses it	2010-01-26 00:26:42
<toffer>	yep	2010-01-26 00:27:12
	guess there's really alot of room for improvement	2010-01-26 00:27:21
<Shelwien>	...and bmp looks symmetric, as expected...	2010-01-26 00:27:35
<toffer>	symmetric?	2010-01-26 00:28:05
<Shelwien>	http://nishi.dreamhosters.com/u/enwik7_le.bmp	2010-01-26 00:28:24
<toffer>	top left is codelen[0][0] and bottom right is [255][255] ?	2010-01-26 00:29:14
	i don'T see alot of symmetry	2010-01-26 00:30:35
	but i see what i saw when looking at the dumps already	2010-01-26 00:30:44
<Shelwien>	diagonal symmetry	2010-01-26 00:30:46
<toffer>	nearby contexts are correlated	2010-01-26 00:30:49
	diagonal?	2010-01-26 00:31:14
<Shelwien>	XY vs YX	2010-01-26 00:33:03
<toffer>	locally maybe	2010-01-26 00:33:19
<Shelwien>	its 0;0 is likely bottom left btw	2010-01-26 00:33:33
<toffer>	a scanline left to right is a context?	2010-01-26 00:35:24
	now i know why i didn't see the symmetry	2010-01-26 00:36:07
	wrong layout interpretation	2010-01-26 00:36:14
	well when looking at the output	2010-01-26 00:38:04
	i found correlations at i-256 and i-1	2010-01-26 00:38:14
	i mean just by looking at the text output	2010-01-26 00:38:35
	guess it's pretty clear in that image	2010-01-26 00:38:41
<Shelwien>	:)	2010-01-26 00:38:49
<toffer>	anyway	2010-01-26 00:54:19
	i gonna sleep now	2010-01-26 00:54:23
	guess tomorrow there gonna be something to tune/test	2010-01-26 00:54:33
	but there's no match model in the decomposition branch	2010-01-26 00:54:51
	yet	2010-01-26 00:55:00
	gn8	2010-01-26 00:55:11
*** toffer has left the channel		2010-01-26 00:55:38
<Shelwien>	!next	2010-01-26 01:24:27