*** STalKer-X has joined the channel		2010-01-21 04:24:57
*** STalKer-Y has left the channel		2010-01-21 04:24:58
*** scott___ has joined the channel		2010-01-21 04:25:16
<scott___>	hi :)	2010-01-21 04:26:28
	its not much but its a start to pack acgt files binary bijective and back http://bijective.dogma.net/dna.zip	2010-01-21 04:28:23
	later	2010-01-21 04:28:43
*** scott___ has left the channel		2010-01-21 04:28:52
*** pinc has joined the channel		2010-01-21 07:33:13
<STalKer-X>	hey	2010-01-21 12:25:47
	Shelwien: what do you think about http://encode.dreamhosters.com/showthread.php?p=10807#post10807 ?	2010-01-21 12:25:57
*** toffer has joined the channel		2010-01-21 12:28:18
<toffer>	hi	2010-01-21 12:29:02
<STalKer-X>	heh	2010-01-21 12:37:58
<toffer>	any news in here	2010-01-21 12:49:45
	guess not that much	2010-01-21 12:49:49
<STalKer-X>	just asked Shelwien about http://encode.dreamhosters.com/showthread.php?p=10807#post10807 :)	2010-01-21 12:54:41
	but he is still sleeping	2010-01-21 12:54:48
<toffer>	that looks rather esotheric to me	2010-01-21 13:00:13
	i didn't pay much attention though	2010-01-21 13:00:25
<STalKer-X>	heh	2010-01-21 13:00:47
	maybe if someone actually tried it o_o	2010-01-21 13:01:15
<toffer>	i'd really say i won't bother to look at that explaination at the website	2010-01-21 13:06:33
	i might be biased but that doesn't look interesting to me	2010-01-21 13:07:05
	are you writing any compressors btw?	2010-01-21 13:26:06
<STalKer-X>	none	2010-01-21 13:26:24
<toffer>	just lurking around - or why are you here?	2010-01-21 13:32:20
<STalKer-X>	curiosity	2010-01-21 13:54:31
<toffer>	if you're courious why not just try to program something?	2010-01-21 13:57:17
<STalKer-X>	i cannot program :(	2010-01-21 13:59:54
<toffer>	you can still try to learn	2010-01-21 14:00:41
	what's your education despite that?	2010-01-21 14:00:50
<STalKer-X>	student	2010-01-21 14:00:58
<toffer>	and what're you studying?	2010-01-21 14:01:30
<STalKer-X>	earth science	2010-01-21 14:01:53
<toffer>	erm?	2010-01-21 14:02:27
	what is that	2010-01-21 14:03:38
<STalKer-X>	never heard of it? :o	2010-01-21 14:04:47
<toffer>	nope	2010-01-21 14:04:50
<STalKer-X>	geologie, geowissenschaften, geotechnologie :)	2010-01-21 14:06:55
<toffer>	dann sag doch gleich geologie	2010-01-21 14:07:36
	darunter kann ich mir ehr was vorstellen	2010-01-21 14:07:43
<STalKer-X>	heisst auf englisch eben earth science :p	2010-01-21 14:08:06
<toffer>	übrigens ist dimitry shkarin (ppmonstr&co) auch geologe -	2010-01-21 14:08:09
	ich kenn nur "geology"	2010-01-21 14:08:39
	aber kein "earth science"	2010-01-21 14:08:48
	und wo studierst du?	2010-01-21 14:09:43
	away for coffee	2010-01-21 14:15:25
<STalKer-X>	tu berlin ;)	2010-01-21 14:19:36
*** pmcontext has joined the channel		2010-01-21 15:11:29
<pmcontext>	Hi	2010-01-21 15:12:01
	hi toffer u there	2010-01-21 15:15:03
<toffer>	hi	2010-01-21 15:51:55
	now, yes	2010-01-21 15:51:57
<pmcontext>	wb	2010-01-21 15:52:03
	any luck with derivation :P	2010-01-21 15:52:41
<toffer>	i didn't continue	2010-01-21 15:53:02
	the first time i derived it it took quite some time	2010-01-21 15:53:24
	i was supposed to use some tricks and reorderings	2010-01-21 15:53:48
<pmcontext>	what do we do then o.o	2010-01-21 15:54:02
<toffer>	but you can still use a different approach for updating your linear weights	2010-01-21 15:54:06
	i mean iterative, numeric minimization	2010-01-21 15:54:24
<pmcontext>	ok how does it work master	2010-01-21 15:55:03
<toffer>	the most simple technique would be gradient descent	2010-01-21 15:55:25
	you're familiar with that?	2010-01-21 15:55:33
<pmcontext>	i heard about it a few times	2010-01-21 15:56:12
<toffer>	just have a look at http://en.wikipedia.org/wiki/Gradient_descent to get a basic understanding	2010-01-21 15:56:43
<pmcontext>	ok	2010-01-21 15:56:50
<toffer>	mirroring the counter cost function we want to minimize H(w)	2010-01-21 16:00:45
<pmcontext>	i see	2010-01-21 16:01:44
	wat is H(w)	2010-01-21 16:01:56
<toffer>	H(w) = -sum i=1..k d_i ( y_i ln p_i + (1-y_i) ln (1-p_i) ), with d_i=c^(i-1), p_i = (1-w) p_0i + w p_1i	2010-01-21 16:02:45
	and with gradient descent	2010-01-21 16:03:20
	you do it like that	2010-01-21 16:03:23
	initially you set w=w0 (i.e. w0=0.5)	2010-01-21 16:03:38
<pmcontext>	ok w0 is 0.5	2010-01-21 16:04:01
<toffer>	than you calculate w' = w - L dH(w)/dw	2010-01-21 16:04:27
	where L is some small constant	2010-01-21 16:04:32
<pmcontext>	i see so l is controls the amount of change in w ?	2010-01-21 16:04:59
	L*	2010-01-21 16:05:06
<toffer>	not exactly	2010-01-21 16:05:13
	it is some damping constant	2010-01-21 16:05:20
<pmcontext>	oh i get	2010-01-21 16:05:30
<toffer>	normally L would be computed in each iteration	2010-01-21 16:05:31
	but using the exact method you'd get unacceptable speeds	2010-01-21 16:06:00
	even just with a single mixer	2010-01-21 16:06:10
<pmcontext>	very slow if we iterate i guess	2010-01-21 16:06:16
<toffer>	thus it (has to be) enough to do just one iteration	2010-01-21 16:06:42
	it'd be best if you write it down on a paper and find dH/dw	2010-01-21 16:07:19
<pmcontext>	ok	2010-01-21 16:07:57
	is there any rule about derivation of a summation	2010-01-21 16:08:37
<toffer>	well... d/dt(x + y) = dx/dt + dy/dt	2010-01-21 16:09:12
	for finite sums you can interchange the order of summation/derivation	2010-01-21 16:09:36
	actually you should face a problem right after finding the derivative (w/o simplifications)	2010-01-21 16:10:13
<pmcontext>	sure i can face problem even before ^_^	2010-01-21 16:10:53
<toffer>	dH/dw = - sum i=1..k d_i dp_i/dw ( y_i/p_i - (1-y_i)/(1-p_i ) )	2010-01-21 16:12:13
	got that?	2010-01-21 16:12:33
<pmcontext>	yes i got the ( y_i/p_i - (1-y_i)/(1-p_i ) ) , but missed out dp_i/dw	2010-01-21 16:12:56
<toffer>	question: is the gradient linear in w?	2010-01-21 16:13:22
<pmcontext>	hmm not sure o.o	2010-01-21 16:14:44
<toffer>	( y_i/p_i - (1-y_i)/(1-p_i ) ) where is w in there?	2010-01-21 16:14:57
<pmcontext>	there isnt	2010-01-21 16:15:06
<toffer>	there is	2010-01-21 16:15:24
	p_i = (1-w) p_0i + w p_1i -> the mixed probability	2010-01-21 16:15:36
	now again - is dH/dw linear in w?	2010-01-21 16:16:05
<pmcontext>	one sec i will check my book	2010-01-21 16:18:23
<toffer>	you can actually see it	2010-01-21 16:18:36
	for example there is the expression y_i/p_i	2010-01-21 16:18:46
	= y_i/( (1-w) p_0i + w p_1i) thus that expression is ~ 1/w	2010-01-21 16:19:09
	thus the gradient is nonlinear in w	2010-01-21 16:19:17
	as a consequence you cannot easily compute the exact gradient at each step	2010-01-21 16:19:37
	that's the problem i meant	2010-01-21 16:19:44
<pmcontext>	o.o so p is formed by w , im not sure wat is meant by linear in w	2010-01-21 16:20:10
<toffer>	well a linear function	2010-01-21 16:20:20
	f(x) = ax + b is linear in x	2010-01-21 16:20:26
<pmcontext>	coz there is no higher powers of x ?	2010-01-21 16:20:39
<toffer>	f(x) = 1/(ax + b) is nonlinear in x	2010-01-21 16:20:39
<pmcontext>	i see	2010-01-21 16:20:55
<toffer>	ok	2010-01-21 16:21:01
<pmcontext>	ok i understand little	2010-01-21 16:21:15
<toffer>	and a further simplification one assumes that the weight was optimal in each step prior the the current, thus the sum in dH/dw vanishes	2010-01-21 16:21:16
	but did you understand that you cannot easily compute dH/dw	2010-01-21 16:21:39
	since it's nonlinear in w	2010-01-21 16:21:46
<pmcontext>	yes it is due to the 1/w	2010-01-21 16:22:07
	and the summation	2010-01-21 16:22:24
<toffer>	as to linear - a function f is linear in x, when it can be formulated like f(x)=ax+b	2010-01-21 16:22:32
	dH/dw = - sum i=1..k d_i dp_i/dw ( y_i/p_i - (1-y_i)/(1-p_i ) )	2010-01-21 16:22:41
	that's the gradient	2010-01-21 16:22:46
<pmcontext>	so we remove summation to make it simpler ?	2010-01-21 16:23:04
<toffer>	now suppose that in previous steps dH/dw for i>1 the terms vanish. that means you suppose that the weight w was optimal in previous iterations	2010-01-21 16:23:37
	yes	2010-01-21 16:23:40
	otherwise you cannot compute it in an acceptable time	2010-01-21 16:23:55
<pmcontext>	i see	2010-01-21 16:24:15
<toffer>	thus you approximate	2010-01-21 16:24:22
	dH/dw = d_1 dp_1/dw ( y_1/p_1 - (1-y_1)/(1-p_1)	2010-01-21 16:24:54
	d_1 = c^(1-1) = 1	2010-01-21 16:25:04
	and p_1 = (1-w) p_01 + w p_11	2010-01-21 16:25:30
	= (p_11-p_01) w + p_01 = dp_1/dw w + p_01	2010-01-21 16:26:06
<pmcontext>	ah this looks much better	2010-01-21 16:26:22
<toffer>	still it contains a division	2010-01-21 16:26:34
	which can make it slow	2010-01-21 16:26:38
	maks	2010-01-21 16:26:58
	makes	2010-01-21 16:26:59
	but that can already be an acceptible solution	2010-01-21 16:27:11
	it can be rewritten	2010-01-21 16:27:52
<pmcontext>	in more simpler way ?	2010-01-21 16:28:09
<toffer>	( y_1/p_1 - (1-y_1)/(1-p_1) ) = (y_i-p_i)/(p_i (1-p_i) )	2010-01-21 16:29:06
	y_i - p_i is the prediction error	2010-01-21 16:29:14
	for example	2010-01-21 16:29:19
	the mixer predicted p_i=0.8	2010-01-21 16:29:27
	and y_i was 1	2010-01-21 16:29:33
	the error is 0.2	2010-01-21 16:29:36
<pmcontext>	oh i see	2010-01-21 16:29:45
<toffer>	now suppose the following	2010-01-21 16:29:47
	we define a function g(x) = 1/(x*(1-x)) for x in (0, 1)	2010-01-21 16:30:15
<pmcontext>	y_i is the encoded bit right ?	2010-01-21 16:30:19
<toffer>	yes	2010-01-21 16:30:22
	whops, i mixed up indices	2010-01-21 16:30:32
	since it's the current iteration i=1	2010-01-21 16:30:42
	or you can simply drop inidices	2010-01-21 16:30:47
<pmcontext>	oh ok	2010-01-21 16:30:51
<toffer>	question: what do you do with g(x) ?	2010-01-21 16:31:01
	to simply processing	2010-01-21 16:31:06
<pmcontext>	but it is 1/0 for x=0 or 1	2010-01-21 16:31:55
<toffer>	that's why i've written x in (0, 1) (exclusive interval)	2010-01-21 16:32:20
<pmcontext>	ohhh i see	2010-01-21 16:32:29
*** pinc has left the channel		2010-01-21 16:32:39
<toffer>	(0, 1) = [0, 1] / {0} / {1}	2010-01-21 16:32:40
	again	2010-01-21 16:32:51
	what do you use g(x) for	2010-01-21 16:32:54
<pmcontext>	as a damper ?	2010-01-21 16:33:16
<toffer>	no	2010-01-21 16:33:26
	let me rewrite it again	2010-01-21 16:33:29
<pmcontext>	oh i think g(x) give a big value for x (0,1)	2010-01-21 16:34:04
<toffer>	dH/dw = dp/dw (y-p) 1/(p (1-p))	2010-01-21 16:34:08
	dH/dw = dp/dw (y-p) g(p)	2010-01-21 16:34:35
<pmcontext>	i see	2010-01-21 16:34:44
<toffer>	just to eliminate the division	2010-01-21 16:34:56
	since it gonna be slow	2010-01-21 16:34:59
<pmcontext>	but what is g(p) meaning	2010-01-21 16:35:07
<toffer>	oh i forgot	2010-01-21 16:35:08
	it's just a term appearing in the simplyfied form of dH/dw	2010-01-21 16:35:22
<pmcontext>	i understnad y-p is prediction error , but g(p) o.o	2010-01-21 16:35:38
<toffer>	dH/dw = dp/dw [y/p - (1-y)/(1-p)]	2010-01-21 16:36:04
	and	2010-01-21 16:36:08
	y/p - (1-y)/(1-p) = (y-p)/(p(1-p))	2010-01-21 16:36:29
	just calculate it yourself	2010-01-21 16:36:39
	that y-p would be there if you'd minimize the rms (y-p)^2	2010-01-21 16:37:02
	but were're supposed to do compression, thus minimizing the entropy	2010-01-21 16:37:18
<pmcontext>	afte simplify we get (y-p) 1/(p (1-p)) , and g(p) = 1/(p(1-p))	2010-01-21 16:37:32
<toffer>	yes	2010-01-21 16:37:39
<pmcontext>	i see	2010-01-21 16:37:57
<toffer>	and that term g(p) rescales the quadratic metric into an entropy metric - but this would be a very bloomy interpretation	2010-01-21 16:38:30
	again	2010-01-21 16:38:52
	how does dH/dw look like alltogether?	2010-01-21 16:39:02
<pmcontext>	scarry dp/dw [y/p - (1-y)/(1-p)]	2010-01-21 16:39:18
<toffer>	and rewritten with g(p) ?	2010-01-21 16:39:35
	and expanding dp/dw	2010-01-21 16:39:51
<pmcontext>	after try to simplify dp/dw (y-p)g(p)	2010-01-21 16:39:56
<toffer>	dH/dw = (p1-p0) (y-p) 1/(p(1-p)) = (p1-p0) (y-p) g(p)	2010-01-21 16:40:44
	with p=(p1-p0) w + p0 = (1-w) p0 + w p1	2010-01-21 16:41:09
	now there's another issue	2010-01-21 16:41:27
	since we use limited precision	2010-01-21 16:41:36
	and numerica stability has to be guaranteed	2010-01-21 16:41:44
<pmcontext>	u mean it can go crazy ?	2010-01-21 16:42:09
<toffer>	yes	2010-01-21 16:42:15
	you should only adjust the weight if g(p) is in a limited range	2010-01-21 16:42:19
	and check its bounds	2010-01-21 16:42:23
<pmcontext>	i see	2010-01-21 16:42:40
<toffer>	1. bounds	2010-01-21 16:42:49
	w' = w + dH/dw	2010-01-21 16:42:59
	and since w actually is a probability	2010-01-21 16:43:08
	it has to be limited in [0, 1]	2010-01-21 16:43:17
<pmcontext>	yes it should be in some fixed range	2010-01-21 16:43:20
	can i use [0, 64] ?	2010-01-21 16:43:31
<toffer>	thus w' = max(0, min(1, w+dH/dw) )	2010-01-21 16:43:32
	if you like to work with 6 bit precision, yes	2010-01-21 16:43:43
	but you can use more	2010-01-21 16:43:48
	and that'd help greatly	2010-01-21 16:43:53
<pmcontext>	oh ok more	2010-01-21 16:43:53
<toffer>	i typically use 12..15 bits	2010-01-21 16:44:03
	and secondly	2010-01-21 16:44:14
<pmcontext>	ok	2010-01-21 16:44:20
<toffer>	if p is near 0	2010-01-21 16:44:22
	or near 1	2010-01-21 16:44:25
	you won't want to adjust weihts	2010-01-21 16:44:30
	and "near" can be classified by some threshold t	2010-01-21 16:44:44
	thus	2010-01-21 16:44:46
	if p<t or p>=1-t don'T do anything	2010-01-21 16:44:58
	and to make things more efficient	2010-01-21 16:45:05
	you can say	2010-01-21 16:45:13
	w' = w + dH/dw -> becomes w' = w for the given condition	2010-01-21 16:45:34
	to avoid comparsions	2010-01-21 16:45:47
	you set g(p) = 0 for p<t or p>=t	2010-01-21 16:46:00
<pmcontext>	oh	2010-01-21 16:46:08
	so that makes large change to w if i update	2010-01-21 16:46:21
<toffer>	thus dH/dw = dH/dw = (p1-p0) (y-p) g(p) = (p1-p0) (y-p) * 0 = 0	2010-01-21 16:46:27
	you avoid large changes	2010-01-21 16:46:36
<pmcontext>	oh i see	2010-01-21 16:46:41
<toffer>	so what you're now supposed to do	2010-01-21 16:46:50
	is to write a mixer class	2010-01-21 16:46:58
	and a function class	2010-01-21 16:47:08
	the function class should be a lookup for g(p)	2010-01-21 16:47:17
<pmcontext>	g(p) is pre calculated ?	2010-01-21 16:47:41
<toffer>	yes	2010-01-21 16:47:44
<pmcontext>	ok	2010-01-21 16:47:54
<toffer>	oh	2010-01-21 16:47:59
	i forgot	2010-01-21 16:48:01
	w' = w + L dH/dw	2010-01-21 16:48:08
<pmcontext>	damping o.o	2010-01-21 16:48:14
<toffer>	yes	2010-01-21 16:48:23
<pmcontext>	but isnt float bad ?	2010-01-21 16:48:38
<toffer>	that L can be simplified into something like L=2^-B	2010-01-21 16:48:40
	thus you can simply write dH/dw >> B	2010-01-21 16:48:51
<pmcontext>	oh	2010-01-21 16:48:58
	ok	2010-01-21 16:49:03
<toffer>	that isn't float based	2010-01-21 16:49:23
	you can use fixed point math	2010-01-21 16:49:26
<pmcontext>	ok i und it is power of 2	2010-01-21 16:49:34
	so i can shift	2010-01-21 16:49:44
	to divide	2010-01-21 16:49:50
<toffer>	like [0, 1) maps to [0, 4095)	2010-01-21 16:49:51
	right	2010-01-21 16:50:03
	erm	2010-01-21 16:50:08
	like [0, 1) maps to [0, 4096)	2010-01-21 16:50:17
<pmcontext>	ok [0,4095]	2010-01-21 16:50:32
<toffer>	note that the implementation is sensitive to values the assignment of precisions and to the value of L	2010-01-21 16:51:04
	selecting wrong values can cause some trouble	2010-01-21 16:51:16
	than there's another weighting scheme which can be derived in a similar manner	2010-01-21 16:51:45
	but i guess that's enough for today	2010-01-21 16:52:04
	ok?	2010-01-21 16:52:11
<pmcontext>	yes	2010-01-21 16:52:13
	dH/dw = (p1-p0) (y-p) g(p) and g(p) = 1/(p(1-p))	2010-01-21 16:52:18
	w' = w + L dH/dw	2010-01-21 16:52:42
	w' needs bounds check	2010-01-21 16:53:07
	and g(p) = 0 if p < t1 or p > t2 near the corner	2010-01-21 16:53:45
	L is power of 2 to avoid division	2010-01-21 16:54:09
<toffer>	right	2010-01-21 16:54:11
	but now please double check if there's no sign error	2010-01-21 16:54:22
	i might have missed a sign while writing	2010-01-21 16:54:32
	(in dH/dw)	2010-01-21 16:55:03
	so happy programming ^^	2010-01-21 16:55:18
<pmcontext>	o.o thank you	2010-01-21 16:55:23
<toffer>	the sign error can blow up everything, if any	2010-01-21 16:55:37
<pmcontext>	i will figure out the sign	2010-01-21 16:55:44
<toffer>	since it'd mean it maximimizes the entropy instead of minimizing	2010-01-21 16:55:53
<pmcontext>	yes i und	2010-01-21 16:56:00
	i will check value of w for each update	2010-01-21 16:56:13
	and its L dH/dw	2010-01-21 16:56:30
	i save the chat ^^	2010-01-21 16:57:51
	for reference	2010-01-21 16:58:08
<toffer>	good idea	2010-01-21 16:59:56
	and as i said in ordinary minimization problems L would be calculated with some similar scheme every step. but that's unacceptably slow in our case	2010-01-21 17:00:41
	there's another way to include an L which is not a power of two, btw	2010-01-21 17:01:21
	you set	2010-01-21 17:01:23
	w' = w + d	2010-01-21 17:01:29
	d = L dH/dw	2010-01-21 17:01:39
	d = L (p1-p0) (y-p) g(p) = (p1-p0) (y-p) (L g(p)) and you set L g(p) = h(p) or something like that	2010-01-21 17:02:24
<pmcontext>	oh	2010-01-21 17:09:12
	but we make lookup only for g(p) ?	2010-01-21 17:10:03
	or L*g(p)	2010-01-21 17:10:12
<toffer>	you can do that as you like	2010-01-21 17:10:53
	i'd say just try g(p)	2010-01-21 17:11:01
<pmcontext>	ok	2010-01-21 17:11:22
<toffer>	for simplicity	2010-01-21 17:11:32
<Shelwien>	hi	2010-01-21 17:11:42
	are you implementing logistic mixing there or what? :)	2010-01-21 17:11:52
<pmcontext>	o.o no logistic , coz i duno it yet	2010-01-21 17:13:47
<toffer>	linear mixing with gradient descent optimization	2010-01-21 17:14:02
<pmcontext>	that is meaning of logistic ?	2010-01-21 17:14:22
<Shelwien>	no	2010-01-21 17:14:28
<toffer>	i was just answering shelwien	2010-01-21 17:14:48
<Shelwien>	but update is kinda similar for logistic	2010-01-21 17:14:50
<toffer>	shelwiens question	2010-01-21 17:14:53
	yes	2010-01-21 17:14:56
	sicne it's gradient based	2010-01-21 17:15:00
	and there is some d ln(...) / dw involved	2010-01-21 17:15:13
	that's why it looks similar	2010-01-21 17:15:17
<Shelwien>	anyway, i'd suggest to try something simpler first	2010-01-21 17:15:19
	like finding the context with the best prediction	2010-01-21 17:15:55
	( max(bit?p:1-p) )	2010-01-21 17:16:10
	and incrementing its freq	2010-01-21 17:16:26
<pmcontext>	i did this and with my o3+o2+o1	2010-01-21 17:16:44
<Shelwien>	then using these freqs as weights	2010-01-21 17:16:45
	and?	2010-01-21 17:16:57
<pmcontext>	it is a bit slow but book1 276016	2010-01-21 17:17:55
<toffer>	with my genetic optimizer i found that two optimal context for book1 are o2 and o5	2010-01-21 17:18:24
	to be specific	2010-01-21 17:18:27
	0x405fff and (1<<40)-1	2010-01-21 17:18:38
*** Shelwien has left the channel		2010-01-21 17:18:42
*** Shelwien has joined the channel		2010-01-21 17:18:45
<pmcontext>	also the weight was limited to [0 to 500]	2010-01-21 17:18:51
	book1 276016 book2 193998	2010-01-21 17:19:31
<toffer>	it'd be good if you'd benchmark on enwik7, too	2010-01-21 17:19:53
<Shelwien>	well, first, we need to clear up this thing about orders	2010-01-21 17:20:10
	as i said before, t[c0][ctx]	2010-01-21 17:20:49
<toffer>	i'm downtown now	2010-01-21 17:20:57
	to have some beer	2010-01-21 17:21:02
<Shelwien>	with ctx being the bit context in the byte	2010-01-21 17:21:07
<toffer>	gonna be back in 4 hours or so	2010-01-21 17:21:08
	good luck	2010-01-21 17:21:12
<Shelwien>	same to you :)	2010-01-21 17:21:19
	so, despite being 2D, that's order0	2010-01-21 17:22:25
	so, the question is what you call "o3+o2+o1" there	2010-01-21 17:22:51
<pmcontext>	well i think i type wrong	2010-01-21 17:23:13
<Shelwien>	if its o0+o1+o2 actually, then 276k is ok in fact	2010-01-21 17:23:16
<pmcontext>	o2 + o1 + o0	2010-01-21 17:23:22
<Shelwien>	then, by adding real o3 there, you should get around 233k	2010-01-21 17:23:54
<pmcontext>	i tried but it seem to use up much memory and stops	2010-01-21 17:24:59
<Shelwien>	well, you need to use some better data structure	2010-01-21 17:25:16
	did you try STL map at least?	2010-01-21 17:25:35
<pmcontext>	oh i used an array	2010-01-21 17:25:48
<Shelwien>	for o3? thats crazy	2010-01-21 17:25:57
<pmcontext>	yes i realise now ><	2010-01-21 17:26:04
	how can i use less memory ?	2010-01-21 17:26:18
<Shelwien>	though you can deal with arrays too, to some extent	2010-01-21 17:26:20
	for example, lets consider o1	2010-01-21 17:26:30
	its 256 probability distribution tables, one for each prefix byte value	2010-01-21 17:26:53
	so, if instead of a single o1[256][256]	2010-01-21 17:27:10
	you'd make p2[256] with pointers to tables for single contexts	2010-01-21 17:28:00
	and allocate these dynamically	2010-01-21 17:28:08
	then you'd use much less memory even for o1	2010-01-21 17:28:19
<pmcontext>	i see	2010-01-21 17:28:28
<Shelwien>	as normally not all byte values are present in text	2010-01-21 17:28:33
<pmcontext>	i allocate only after i see a byte ?	2010-01-21 17:29:03
<Shelwien>	then, in a similar way, you can make tables of pointers to pointers, for o2, etc	2010-01-21 17:29:08
	yes	2010-01-21 17:29:16
	i'd say it differently though	2010-01-21 17:29:28
	if you see a null context pointer value in the table	2010-01-21 17:29:47
	you allocate and init that table	2010-01-21 17:30:08
<pmcontext>	ok , and to encode that bit i dont use this order coz it empty ?	2010-01-21 17:31:16
<Shelwien>	you can still init and use it after that	2010-01-21 17:31:32
<pmcontext>	oh ok	2010-01-21 17:31:48
<Shelwien>	though of course its better to skip it at this point	2010-01-21 17:31:52
<pmcontext>	we mix the orders which have tables right?	2010-01-21 17:32:16
<Shelwien>	as i said, we also can always mix all orders, if you need simplicity	2010-01-21 17:32:41
	after making sure that all the stats are allocated and initialized	2010-01-21 17:32:59
	...	2010-01-21 17:33:22
<pmcontext>	ok	2010-01-21 17:33:57
<Shelwien>	also, i'd suggest to consider some similarity there	2010-01-21 17:34:14
	for example, if you don't have current o1 allocated before	2010-01-21 17:34:48
	that also means that o2+ are not allocated as well	2010-01-21 17:34:58
<pmcontext>	yes	2010-01-21 17:35:09
<Shelwien>	so, the simpler implementation	2010-01-21 17:35:22
<pmcontext>	enwik8 36,645,743	2010-01-21 17:35:23
<Shelwien>	would be to store pointers to higher order with each symbol	2010-01-21 17:35:56
<pmcontext>	u mean like a tree ?	2010-01-21 17:36:25
<Shelwien>	of course :)	2010-01-21 17:36:37
	i just tried to explain that tree appears automatically here	2010-01-21 17:37:02
	when we try to simplify the algorithm :)	2010-01-21 17:37:24
<pmcontext>	but for each symbol there can be 256 child ? o.o how i point to so many	2010-01-21 17:37:45
<Shelwien>	what's the problem?	2010-01-21 17:38:05
<pmcontext>	i mean 256 different symbols can folow each symbol	2010-01-21 17:38:12
<Shelwien>	you have a basic context structure	2010-01-21 17:38:26
<pmcontext>	oh	2010-01-21 17:38:40
<Shelwien>	with 256 freqs... or 255 bitwise counters, don't matter	2010-01-21 17:38:45
	and i suggest to also add 256 pointers there	2010-01-21 17:38:59
<pmcontext>	i see	2010-01-21 17:39:43
<Shelwien>	so if you have access to statistics for symbol "a" in context "xxx"	2010-01-21 17:39:47
	that automatically means that you have a pointer to whole table for context "xxxa"	2010-01-21 17:40:04
	and if that pointer is null, you just have to allocate a new context struct	2010-01-21 17:40:30
	and store its address into the pointer	2010-01-21 17:40:39
<pmcontext>	i see	2010-01-21 17:41:00
<Shelwien>	btw stalker	2010-01-21 17:41:14
	that weird post was about representing some long numbers with chain fractions or something	2010-01-21 17:41:43
	the idea was to take 100 bits of data and compute some coefs for a formula, which would output the same value	2010-01-21 17:42:33
	and well, that's completely unrelated to compression	2010-01-21 17:43:02
<STalKer-X>	so instead compressing the data, you compress the coeffs	2010-01-21 17:43:28
<Shelwien>	yeah, but in general such transformation don't reduce the amount of information	2010-01-21 17:43:54
	(i mean, unless they're lossy transformations)	2010-01-21 17:44:18
	and for lossless compression its best to not perform any data transformation at all	2010-01-21 17:44:49
	because any transformations make it much harder to use some original correlations	2010-01-21 17:45:31
	a good example of that is BWT	2010-01-21 17:46:00
	if you'd transform some utf16 chinese text with it	2010-01-21 17:46:23
	the compression would be much worse than even LZ probably	2010-01-21 17:47:14
<pmcontext>	enwik8 25946148 with crazy hashtable mixer	2010-01-21 17:49:16
	very slow but	2010-01-21 17:49:28
<Shelwien>	well, i'm not that good with enwik results :)	2010-01-21 17:49:50
<pmcontext>	book1 242005	2010-01-21 17:50:06
<Shelwien>	now, that's not quite good for o3	2010-01-21 17:50:23
	should be 233-234k	2010-01-21 17:50:35
<pmcontext>	oh	2010-01-21 17:50:39
<Shelwien>	here's some o3 for you: http://shelwien.googlepages.com/o3bfa.rar	2010-01-21 17:51:11
<pmcontext>	wait this is other program that i wrote , not same mixer	2010-01-21 17:51:22
	i use fixed weight for 5 orders	2010-01-21 17:51:49
<Shelwien>	that's unrelated, i'm just saying what's good for o3 :)	2010-01-21 17:51:53
<pmcontext>	oh ok	2010-01-21 17:52:00
<Shelwien>	that coder there uses another approach to mixing	2010-01-21 17:52:45
	BFA aka Bruteforce Adaptation :)	2010-01-21 17:53:15
	it collects the codelengths for all parameter (like weight) values from a given set	2010-01-21 17:53:43
<pmcontext>	o.o 233k !! u implement very efficiently	2010-01-21 17:54:04
<Shelwien>	and uses the value providing the shortest code for actual coding	2010-01-21 17:54:08
	that's a program from 2002 :)	2010-01-21 17:55:10
*** toffer has left the channel		2010-01-21 18:15:30
*** scott___ has joined the channel		2010-01-21 18:34:03
<pmcontext>	i g2g bye and thanks	2010-01-21 18:34:12
*** pmcontext has left the channel		2010-01-21 18:34:40
<scott___>	hope I did not scare him away	2010-01-21 18:35:00
<STalKer-X>	nope	2010-01-21 18:35:59
<Shelwien>	hi	2010-01-21 18:36:47
	i checked comp.compression	2010-01-21 18:37:17
<scott___>	i may not reply quicikly but I have this thing on a tab I am in the mood to do some coding today	2010-01-21 18:37:22
	hi	2010-01-21 18:37:26
<Shelwien>	and beside your posts, there was also that guy with huffman compression for database records	2010-01-21 18:38:08
<scott___>	I thought alot about DNA compression and I have some ideas to try out that will hopefully work pretty neat	2010-01-21 18:38:17
	on here	2010-01-21 18:38:31
	the guy on comp.compression	2010-01-21 18:38:45
<Shelwien>	anyway, thanks to him, i just remembered about the possibility of bitcoding more compact than huffman :)	2010-01-21 18:39:53
<scott___>	was his name glenn	2010-01-21 18:40:45
<Shelwien>	yeah	2010-01-21 18:40:55
<scott___>	was he just on what was his nick	2010-01-21 18:41:05
<Shelwien>	he didn't appear here if you mean that :)	2010-01-21 18:41:44
	i just read his post in comp.compression and it reminded me about nonprefix coding	2010-01-21 18:42:33
	anyway, food calls, so later :)	2010-01-21 18:42:41
<scott___>	oh I think comp.compression a step done from here so I think I post there different;y than here	2010-01-21 18:43:10
	later then	2010-01-21 18:43:34
<Shelwien>	yesterday I found that finereader 10 finally has support for japanese in the main branch	2010-01-21 19:14:52
	(it seemed to have it long before too, but it was only distributed with SDK or something - wasn't able to find it)	2010-01-21 19:15:32
	but i guess it was to early to be happy about that :)	2010-01-21 19:16:04
	i tried OCRing a book	2010-01-21 19:16:09
	and it seems that its still bad for this type of symbols	2010-01-21 19:17:17
	in the end, even though finereader has some very nice GUI features	2010-01-21 19:18:05
	i guess readiris which is used before is still better for japanese, despite its very simplistic interface	2010-01-21 19:19:22
<scott___>	1139718 on e.coli and not done yet	2010-01-21 20:01:07
<Shelwien>	i guess we'd need to find more genetic data for this	2010-01-21 20:02:33
	i mean, e.coli compressor is an even more weird thing than book1 compressor :)	2010-01-21 20:03:07
<scott___>	well nature is making its own rules	2010-01-21 20:03:26
*** toffer has joined the channel		2010-01-21 21:00:42
<toffer>	hi	2010-01-21 21:02:06
<Shelwien>	hi	2010-01-21 21:16:05
	pmcontext expired	2010-01-21 21:16:17
	matt added an entry for acb: http://mattmahoney.net/dc/text.html#2185	2010-01-21 21:22:08