*** STalKer-X has joined the channel2010-01-21 04:24:57
*** STalKer-Y has left the channel2010-01-21 04:24:58
*** scott___ has joined the channel2010-01-21 04:25:16
<scott___> hi :)2010-01-21 04:26:28
 its not much but its a start to pack acgt files binary bijective and back http://bijective.dogma.net/dna.zip2010-01-21 04:28:23
 later2010-01-21 04:28:43
*** scott___ has left the channel2010-01-21 04:28:52
*** pinc has joined the channel2010-01-21 07:33:13
<STalKer-X> hey2010-01-21 12:25:47
 Shelwien: what do you think about http://encode.dreamhosters.com/showthread.php?p=10807#post10807 ?2010-01-21 12:25:57
*** toffer has joined the channel2010-01-21 12:28:18
<toffer> hi2010-01-21 12:29:02
<STalKer-X> heh2010-01-21 12:37:58
<toffer> any news in here2010-01-21 12:49:45
 guess not that much2010-01-21 12:49:49
<STalKer-X> just asked Shelwien about http://encode.dreamhosters.com/showthread.php?p=10807#post10807 :)2010-01-21 12:54:41
 but he is still sleeping2010-01-21 12:54:48
<toffer> that looks rather esotheric to me2010-01-21 13:00:13
 i didn't pay much attention though2010-01-21 13:00:25
<STalKer-X> heh2010-01-21 13:00:47
 maybe if someone actually tried it o_o2010-01-21 13:01:15
<toffer> i'd really say i won't bother to look at that explaination at the website2010-01-21 13:06:33
 i might be biased but that doesn't look interesting to me2010-01-21 13:07:05
 are you writing any compressors btw?2010-01-21 13:26:06
<STalKer-X> none2010-01-21 13:26:24
<toffer> just lurking around - or why are you here?2010-01-21 13:32:20
<STalKer-X> curiosity2010-01-21 13:54:31
<toffer> if you're courious why not just try to program something?2010-01-21 13:57:17
<STalKer-X> i cannot program :(2010-01-21 13:59:54
<toffer> you can still try to learn2010-01-21 14:00:41
 what's your education despite that?2010-01-21 14:00:50
<STalKer-X> student2010-01-21 14:00:58
<toffer> and what're you studying?2010-01-21 14:01:30
<STalKer-X> earth science2010-01-21 14:01:53
<toffer> erm?2010-01-21 14:02:27
 what is that2010-01-21 14:03:38
<STalKer-X> never heard of it? :o2010-01-21 14:04:47
<toffer> nope2010-01-21 14:04:50
<STalKer-X> geologie, geowissenschaften, geotechnologie :)2010-01-21 14:06:55
<toffer> dann sag doch gleich geologie2010-01-21 14:07:36
 darunter kann ich mir ehr was vorstellen2010-01-21 14:07:43
<STalKer-X> heisst auf englisch eben earth science :p2010-01-21 14:08:06
<toffer> übrigens ist dimitry shkarin (ppmonstr&co) auch geologe -2010-01-21 14:08:09
 ich kenn nur "geology"2010-01-21 14:08:39
 aber kein "earth science"2010-01-21 14:08:48
 und wo studierst du?2010-01-21 14:09:43
 away for coffee2010-01-21 14:15:25
<STalKer-X> tu berlin ;)2010-01-21 14:19:36
*** pmcontext has joined the channel2010-01-21 15:11:29
<pmcontext> Hi2010-01-21 15:12:01
 hi toffer u there2010-01-21 15:15:03
<toffer> hi2010-01-21 15:51:55
 now, yes2010-01-21 15:51:57
<pmcontext> wb2010-01-21 15:52:03
 any luck with derivation :P2010-01-21 15:52:41
<toffer> i didn't continue2010-01-21 15:53:02
 the first time i derived it it took quite some time2010-01-21 15:53:24
 i was supposed to use some tricks and reorderings2010-01-21 15:53:48
<pmcontext> what do we do then o.o2010-01-21 15:54:02
<toffer> but you can still use a different approach for updating your linear weights2010-01-21 15:54:06
 i mean iterative, numeric minimization2010-01-21 15:54:24
<pmcontext> ok how does it work master2010-01-21 15:55:03
<toffer> the most simple technique would be gradient descent2010-01-21 15:55:25
 you're familiar with that?2010-01-21 15:55:33
<pmcontext> i heard about it a few times2010-01-21 15:56:12
<toffer> just have a look at http://en.wikipedia.org/wiki/Gradient_descent to get a basic understanding2010-01-21 15:56:43
<pmcontext> ok2010-01-21 15:56:50
<toffer> mirroring the counter cost function we want to minimize H(w)2010-01-21 16:00:45
<pmcontext> i see2010-01-21 16:01:44
 wat is H(w)2010-01-21 16:01:56
<toffer> H(w) = -sum i=1..k d_i ( y_i ln p_i + (1-y_i) ln (1-p_i) ), with d_i=c^(i-1), p_i = (1-w) p_0i + w p_1i2010-01-21 16:02:45
 and with gradient descent2010-01-21 16:03:20
 you do it like that2010-01-21 16:03:23
 initially you set w=w0 (i.e. w0=0.5)2010-01-21 16:03:38
<pmcontext> ok w0 is 0.52010-01-21 16:04:01
<toffer> than you calculate w' = w - L dH(w)/dw2010-01-21 16:04:27
 where L is some small constant2010-01-21 16:04:32
<pmcontext> i see so l is controls the amount of change in w ?2010-01-21 16:04:59
 L*2010-01-21 16:05:06
<toffer> not exactly2010-01-21 16:05:13
 it is some damping constant2010-01-21 16:05:20
<pmcontext> oh i get2010-01-21 16:05:30
<toffer> normally L would be computed in each iteration2010-01-21 16:05:31
 but using the exact method you'd get unacceptable speeds2010-01-21 16:06:00
 even just with a single mixer2010-01-21 16:06:10
<pmcontext> very slow if we iterate i guess2010-01-21 16:06:16
<toffer> thus it (has to be) enough to do just one iteration2010-01-21 16:06:42
 it'd be best if you write it down on a paper and find dH/dw2010-01-21 16:07:19
<pmcontext> ok 2010-01-21 16:07:57
 is there any rule about derivation of a summation2010-01-21 16:08:37
<toffer> well... d/dt(x + y) = dx/dt + dy/dt2010-01-21 16:09:12
 for finite sums you can interchange the order of summation/derivation2010-01-21 16:09:36
 actually you should face a problem right after finding the derivative (w/o simplifications)2010-01-21 16:10:13
<pmcontext> sure i can face problem even before ^_^2010-01-21 16:10:53
<toffer> dH/dw = - sum i=1..k d_i dp_i/dw ( y_i/p_i - (1-y_i)/(1-p_i ) )2010-01-21 16:12:13
 got that?2010-01-21 16:12:33
<pmcontext> yes i got the ( y_i/p_i - (1-y_i)/(1-p_i ) ) , but missed out dp_i/dw2010-01-21 16:12:56
<toffer> question: is the gradient linear in w?2010-01-21 16:13:22
<pmcontext> hmm not sure o.o2010-01-21 16:14:44
<toffer> ( y_i/p_i - (1-y_i)/(1-p_i ) ) where is w in there?2010-01-21 16:14:57
<pmcontext> there isnt 2010-01-21 16:15:06
<toffer> there is2010-01-21 16:15:24
 p_i = (1-w) p_0i + w p_1i -> the mixed probability2010-01-21 16:15:36
 now again - is dH/dw linear in w?2010-01-21 16:16:05
<pmcontext> one sec i will check my book2010-01-21 16:18:23
<toffer> you can actually see it2010-01-21 16:18:36
 for example there is the expression y_i/p_i2010-01-21 16:18:46
 = y_i/( (1-w) p_0i + w p_1i) thus that expression is ~ 1/w2010-01-21 16:19:09
 thus the gradient is nonlinear in w2010-01-21 16:19:17
 as a consequence you cannot easily compute the exact gradient at each step 2010-01-21 16:19:37
 that's the problem i meant2010-01-21 16:19:44
<pmcontext> o.o so p is formed by w , im not sure wat is meant by linear in w2010-01-21 16:20:10
<toffer> well a linear function2010-01-21 16:20:20
 f(x) = ax + b is linear in x2010-01-21 16:20:26
<pmcontext> coz there is no higher powers of x ?2010-01-21 16:20:39
<toffer> f(x) = 1/(ax + b) is nonlinear in x2010-01-21 16:20:39
<pmcontext> i see2010-01-21 16:20:55
<toffer> ok2010-01-21 16:21:01
<pmcontext> ok i understand little2010-01-21 16:21:15
<toffer> and a further simplification one assumes that the weight was optimal in each step prior the the current, thus the sum in dH/dw vanishes2010-01-21 16:21:16
 but did you understand that you cannot easily compute dH/dw2010-01-21 16:21:39
 since it's nonlinear in w2010-01-21 16:21:46
<pmcontext> yes it is due to the 1/w2010-01-21 16:22:07
 and the summation2010-01-21 16:22:24
<toffer> as to linear - a function f is linear in x, when it can be formulated like f(x)=ax+b2010-01-21 16:22:32
 dH/dw = - sum i=1..k d_i dp_i/dw ( y_i/p_i - (1-y_i)/(1-p_i ) )2010-01-21 16:22:41
 that's the gradient2010-01-21 16:22:46
<pmcontext> so we remove summation to make it simpler ?2010-01-21 16:23:04
<toffer> now suppose that in previous steps dH/dw for i>1 the terms vanish. that means you suppose that the weight w was optimal in previous iterations2010-01-21 16:23:37
 yes2010-01-21 16:23:40
 otherwise you cannot compute it in an acceptable time2010-01-21 16:23:55
<pmcontext> i see2010-01-21 16:24:15
<toffer> thus you approximate2010-01-21 16:24:22
 dH/dw = d_1 dp_1/dw ( y_1/p_1 - (1-y_1)/(1-p_1)2010-01-21 16:24:54
 d_1 = c^(1-1) = 12010-01-21 16:25:04
 and p_1 = (1-w) p_01 + w p_112010-01-21 16:25:30
 = (p_11-p_01) w + p_01 = dp_1/dw w + p_012010-01-21 16:26:06
<pmcontext> ah this looks much better2010-01-21 16:26:22
<toffer> still it contains a division2010-01-21 16:26:34
 which can make it slow2010-01-21 16:26:38
 maks 2010-01-21 16:26:58
 makes2010-01-21 16:26:59
 but that can already be an acceptible solution2010-01-21 16:27:11
 it can be rewritten2010-01-21 16:27:52
<pmcontext> in more simpler way ?2010-01-21 16:28:09
<toffer> ( y_1/p_1 - (1-y_1)/(1-p_1) ) = (y_i-p_i)/(p_i (1-p_i) )2010-01-21 16:29:06
 y_i - p_i is the prediction error2010-01-21 16:29:14
 for example2010-01-21 16:29:19
 the mixer predicted p_i=0.82010-01-21 16:29:27
 and y_i was 12010-01-21 16:29:33
 the error is 0.22010-01-21 16:29:36
<pmcontext> oh i see2010-01-21 16:29:45
<toffer> now suppose the following2010-01-21 16:29:47
 we define a function g(x) = 1/(x*(1-x)) for x in (0, 1)2010-01-21 16:30:15
<pmcontext> y_i is the encoded bit right ?2010-01-21 16:30:19
<toffer> yes2010-01-21 16:30:22
 whops, i mixed up indices2010-01-21 16:30:32
 since it's the current iteration i=12010-01-21 16:30:42
 or you can simply drop inidices2010-01-21 16:30:47
<pmcontext> oh ok2010-01-21 16:30:51
<toffer> question: what do you do with g(x) ?2010-01-21 16:31:01
 to simply processing2010-01-21 16:31:06
<pmcontext> but it is 1/0 for x=0 or 12010-01-21 16:31:55
<toffer> that's why i've written x in (0, 1) (exclusive interval)2010-01-21 16:32:20
<pmcontext> ohhh i see2010-01-21 16:32:29
*** pinc has left the channel2010-01-21 16:32:39
<toffer> (0, 1) = [0, 1] / {0} / {1}2010-01-21 16:32:40
 again2010-01-21 16:32:51
 what do you use g(x) for2010-01-21 16:32:54
<pmcontext> as a damper ?2010-01-21 16:33:16
<toffer> no2010-01-21 16:33:26
 let me rewrite it again2010-01-21 16:33:29
<pmcontext> oh i think g(x) give a big value for x (0,1)2010-01-21 16:34:04
<toffer> dH/dw = dp/dw (y-p) 1/(p (1-p))2010-01-21 16:34:08
 dH/dw = dp/dw (y-p) g(p)2010-01-21 16:34:35
<pmcontext> i see2010-01-21 16:34:44
<toffer> just to eliminate the division2010-01-21 16:34:56
 since it gonna be slow2010-01-21 16:34:59
<pmcontext> but what is g(p) meaning 2010-01-21 16:35:07
<toffer> oh i forgot2010-01-21 16:35:08
 it's just a term appearing in the simplyfied form of dH/dw2010-01-21 16:35:22
<pmcontext> i understnad y-p is prediction error , but g(p) o.o2010-01-21 16:35:38
<toffer> dH/dw = dp/dw [y/p - (1-y)/(1-p)]2010-01-21 16:36:04
 and2010-01-21 16:36:08
 y/p - (1-y)/(1-p) = (y-p)/(p(1-p))2010-01-21 16:36:29
 just calculate it yourself2010-01-21 16:36:39
 that y-p would be there if you'd minimize the rms (y-p)^22010-01-21 16:37:02
 but were're supposed to do compression, thus minimizing the entropy2010-01-21 16:37:18
<pmcontext> afte simplify we get (y-p) 1/(p (1-p)) , and g(p) = 1/(p(1-p))2010-01-21 16:37:32
<toffer> yes2010-01-21 16:37:39
<pmcontext> i see2010-01-21 16:37:57
<toffer> and that term g(p) rescales the quadratic metric into an entropy metric - but this would be a very *bloomy* interpretation2010-01-21 16:38:30
 again2010-01-21 16:38:52
 how does dH/dw look like alltogether?2010-01-21 16:39:02
<pmcontext> scarry dp/dw [y/p - (1-y)/(1-p)]2010-01-21 16:39:18
<toffer> and rewritten with g(p) ?2010-01-21 16:39:35
 and expanding dp/dw2010-01-21 16:39:51
<pmcontext> after try to simplify dp/dw (y-p)g(p)2010-01-21 16:39:56
<toffer> dH/dw = (p1-p0) (y-p) 1/(p(1-p)) = (p1-p0) (y-p) g(p)2010-01-21 16:40:44
 with p=(p1-p0) w + p0 = (1-w) p0 + w p12010-01-21 16:41:09
 now there's another issue2010-01-21 16:41:27
 since we use limited precision2010-01-21 16:41:36
 and numerica stability has to be guaranteed2010-01-21 16:41:44
<pmcontext> u mean it can go crazy ?2010-01-21 16:42:09
<toffer> yes2010-01-21 16:42:15
 you should only adjust the weight if g(p) is in a limited range2010-01-21 16:42:19
 and check its bounds2010-01-21 16:42:23
<pmcontext> i see2010-01-21 16:42:40
<toffer> 1. bounds2010-01-21 16:42:49
 w' = w + dH/dw2010-01-21 16:42:59
 and since w actually is a probability2010-01-21 16:43:08
 it has to be limited in [0, 1]2010-01-21 16:43:17
<pmcontext> yes it should be in some fixed range2010-01-21 16:43:20
 can i use [0, 64] ?2010-01-21 16:43:31
<toffer> thus w' = max(0, min(1, w+dH/dw) )2010-01-21 16:43:32
 if you like to work with 6 bit precision, yes2010-01-21 16:43:43
 but you can use more2010-01-21 16:43:48
 and that'd help greatly2010-01-21 16:43:53
<pmcontext> oh ok more2010-01-21 16:43:53
<toffer> i typically use 12..15 bits2010-01-21 16:44:03
 and secondly2010-01-21 16:44:14
<pmcontext> ok2010-01-21 16:44:20
<toffer> if p is near 02010-01-21 16:44:22
 or near 12010-01-21 16:44:25
 you won't want to adjust weihts2010-01-21 16:44:30
 and "near" can be classified by some threshold t2010-01-21 16:44:44
 thus2010-01-21 16:44:46
 if p<t or p>=1-t don'T do anything2010-01-21 16:44:58
 and to make things more efficient2010-01-21 16:45:05
 you can say2010-01-21 16:45:13
 w' = w + dH/dw -> becomes w' = w for the given condition2010-01-21 16:45:34
 to avoid comparsions2010-01-21 16:45:47
 you set g(p) = 0 for p<t or p>=t2010-01-21 16:46:00
<pmcontext> oh2010-01-21 16:46:08
 so that makes large change to w if i update2010-01-21 16:46:21
<toffer> thus dH/dw = dH/dw = (p1-p0) (y-p) g(p) = (p1-p0) (y-p) * 0 = 02010-01-21 16:46:27
 you avoid large changes2010-01-21 16:46:36
<pmcontext> oh i see2010-01-21 16:46:41
<toffer> so what you're now supposed to do2010-01-21 16:46:50
 is to write a mixer class2010-01-21 16:46:58
 and a function class2010-01-21 16:47:08
 the function class should be a lookup for g(p)2010-01-21 16:47:17
<pmcontext> g(p) is pre calculated ?2010-01-21 16:47:41
<toffer> yes2010-01-21 16:47:44
<pmcontext> ok2010-01-21 16:47:54
<toffer> oh2010-01-21 16:47:59
 i forgot2010-01-21 16:48:01
 w' = w + L dH/dw 2010-01-21 16:48:08
<pmcontext> damping o.o2010-01-21 16:48:14
<toffer> yes2010-01-21 16:48:23
<pmcontext> but isnt float bad ?2010-01-21 16:48:38
<toffer> that L can be simplified into something like L=2^-B2010-01-21 16:48:40
 thus you can simply write dH/dw >> B2010-01-21 16:48:51
<pmcontext> oh2010-01-21 16:48:58
 ok2010-01-21 16:49:03
<toffer> that isn't float based2010-01-21 16:49:23
 you can use fixed point math2010-01-21 16:49:26
<pmcontext> ok i und it is power of 22010-01-21 16:49:34
 so i can shift2010-01-21 16:49:44
 to divide2010-01-21 16:49:50
<toffer> like [0, 1) maps to [0, 4095)2010-01-21 16:49:51
 right2010-01-21 16:50:03
 erm2010-01-21 16:50:08
 like [0, 1) maps to [0, 4096)2010-01-21 16:50:17
<pmcontext> ok [0,4095]2010-01-21 16:50:32
<toffer> note that the implementation is sensitive to values the assignment of precisions and to the value of L2010-01-21 16:51:04
 selecting wrong values can cause some trouble2010-01-21 16:51:16
 than there's another weighting scheme which can be derived in a similar manner2010-01-21 16:51:45
 but i guess that's enough for today2010-01-21 16:52:04
 ok?2010-01-21 16:52:11
<pmcontext> yes 2010-01-21 16:52:13
 dH/dw = (p1-p0) (y-p) g(p) and g(p) = 1/(p(1-p))2010-01-21 16:52:18
 w' = w + L dH/dw2010-01-21 16:52:42
 w' needs bounds check2010-01-21 16:53:07
 and g(p) = 0 if p < t1 or p > t2 near the corner2010-01-21 16:53:45
 L is power of 2 to avoid division2010-01-21 16:54:09
<toffer> right2010-01-21 16:54:11
 but now please double check if there's no sign error2010-01-21 16:54:22
 i might have missed a sign while writing2010-01-21 16:54:32
 (in dH/dw)2010-01-21 16:55:03
 so happy programming ^^2010-01-21 16:55:18
<pmcontext> o.o thank you2010-01-21 16:55:23
<toffer> the sign error can blow up everything, if any2010-01-21 16:55:37
<pmcontext> i will figure out the sign2010-01-21 16:55:44
<toffer> since it'd mean it maximimizes the entropy instead of minimizing2010-01-21 16:55:53
<pmcontext> yes i und2010-01-21 16:56:00
 i will check value of w for each update2010-01-21 16:56:13
 and its L dH/dw2010-01-21 16:56:30
 i save the chat ^^2010-01-21 16:57:51
 for reference 2010-01-21 16:58:08
<toffer> good idea2010-01-21 16:59:56
 and as i said in ordinary minimization problems L would be calculated with some similar scheme every step. but that's unacceptably slow in our case2010-01-21 17:00:41
 there's another way to include an L which is not a power of two, btw2010-01-21 17:01:21
 you set2010-01-21 17:01:23
 w' = w + d2010-01-21 17:01:29
 d = L dH/dw2010-01-21 17:01:39
 d = L (p1-p0) (y-p) g(p) = (p1-p0) (y-p) (L g(p)) and you set L g(p) = h(p) or something like that2010-01-21 17:02:24
<pmcontext> oh2010-01-21 17:09:12
 but we make lookup only for g(p) ?2010-01-21 17:10:03
 or L*g(p)2010-01-21 17:10:12
<toffer> you can do that as you like2010-01-21 17:10:53
 i'd say just try g(p)2010-01-21 17:11:01
<pmcontext> ok2010-01-21 17:11:22
<toffer> for simplicity2010-01-21 17:11:32
<Shelwien> hi2010-01-21 17:11:42
 are you implementing logistic mixing there or what? :)2010-01-21 17:11:52
<pmcontext> o.o no logistic , coz i duno it yet2010-01-21 17:13:47
<toffer> linear mixing with gradient descent optimization2010-01-21 17:14:02
<pmcontext> that is meaning of logistic ?2010-01-21 17:14:22
<Shelwien> no2010-01-21 17:14:28
<toffer> i was just answering shelwien2010-01-21 17:14:48
<Shelwien> but update is kinda similar for logistic2010-01-21 17:14:50
<toffer> shelwiens question2010-01-21 17:14:53
 yes2010-01-21 17:14:56
 sicne it's gradient based2010-01-21 17:15:00
 and there is some d ln(...) / dw involved2010-01-21 17:15:13
 that's why it looks similar2010-01-21 17:15:17
<Shelwien> anyway, i'd suggest to try something simpler first2010-01-21 17:15:19
 like finding the context with the best prediction2010-01-21 17:15:55
 ( max(bit?p:1-p) )2010-01-21 17:16:10
 and incrementing its freq2010-01-21 17:16:26
<pmcontext> i did this and with my o3+o2+o12010-01-21 17:16:44
<Shelwien> then using these freqs as weights2010-01-21 17:16:45
 and?2010-01-21 17:16:57
<pmcontext> it is a bit slow but book1 2760162010-01-21 17:17:55
<toffer> with my genetic optimizer i found that two optimal context for book1 are o2 and o52010-01-21 17:18:24
 to be specific2010-01-21 17:18:27
 0x405fff and (1<<40)-12010-01-21 17:18:38
*** Shelwien has left the channel2010-01-21 17:18:42
*** Shelwien has joined the channel2010-01-21 17:18:45
<pmcontext> also the weight was limited to [0 to 500]2010-01-21 17:18:51
 book1 276016 book2 1939982010-01-21 17:19:31
<toffer> it'd be good if you'd benchmark on enwik7, too2010-01-21 17:19:53
<Shelwien> well, first, we need to clear up this thing about orders2010-01-21 17:20:10
 as i said before, t[c0][ctx]2010-01-21 17:20:49
<toffer> i'm downtown now2010-01-21 17:20:57
 to have some beer2010-01-21 17:21:02
<Shelwien> with ctx being the bit context in the byte2010-01-21 17:21:07
<toffer> gonna be back in 4 hours or so2010-01-21 17:21:08
 good luck2010-01-21 17:21:12
<Shelwien> same to you :)2010-01-21 17:21:19
 so, despite being 2D, that's order0 2010-01-21 17:22:25
 so, the question is what you call "o3+o2+o1" there2010-01-21 17:22:51
<pmcontext> well i think i type wrong2010-01-21 17:23:13
<Shelwien> if its o0+o1+o2 actually, then 276k is ok in fact2010-01-21 17:23:16
<pmcontext> o2 + o1 + o02010-01-21 17:23:22
<Shelwien> then, by adding real o3 there, you should get around 233k2010-01-21 17:23:54
<pmcontext> i tried but it seem to use up much memory and stops2010-01-21 17:24:59
<Shelwien> well, you need to use some better data structure2010-01-21 17:25:16
 did you try STL map at least?2010-01-21 17:25:35
<pmcontext> oh i used an array 2010-01-21 17:25:48
<Shelwien> for o3? thats crazy2010-01-21 17:25:57
<pmcontext> yes i realise now ><2010-01-21 17:26:04
 how can i use less memory ?2010-01-21 17:26:18
<Shelwien> though you can deal with arrays too, to some extent2010-01-21 17:26:20
 for example, lets consider o12010-01-21 17:26:30
 its 256 probability distribution tables, one for each prefix byte value2010-01-21 17:26:53
 so, if instead of a single o1[256][256]2010-01-21 17:27:10
 you'd make p2[256] with pointers to tables for single contexts2010-01-21 17:28:00
 and allocate these dynamically2010-01-21 17:28:08
 then you'd use much less memory even for o12010-01-21 17:28:19
<pmcontext> i see2010-01-21 17:28:28
<Shelwien> as normally not all byte values are present in text2010-01-21 17:28:33
<pmcontext> i allocate only after i see a byte ?2010-01-21 17:29:03
<Shelwien> then, in a similar way, you can make tables of pointers to pointers, for o2, etc2010-01-21 17:29:08
 yes2010-01-21 17:29:16
 i'd say it differently though2010-01-21 17:29:28
 if you see a null context pointer value in the table2010-01-21 17:29:47
 you allocate and init that table2010-01-21 17:30:08
<pmcontext> ok , and to encode that bit i dont use this order coz it empty ?2010-01-21 17:31:16
<Shelwien> you can still init and use it after that2010-01-21 17:31:32
<pmcontext> oh ok2010-01-21 17:31:48
<Shelwien> though of course its better to skip it at this point2010-01-21 17:31:52
<pmcontext> we mix the orders which have tables right?2010-01-21 17:32:16
<Shelwien> as i said, we also can always mix all orders, if you need simplicity2010-01-21 17:32:41
 after making sure that all the stats are allocated and initialized2010-01-21 17:32:59
 ...2010-01-21 17:33:22
<pmcontext> ok2010-01-21 17:33:57
<Shelwien> also, i'd suggest to consider some similarity there2010-01-21 17:34:14
 for example, if you don't have current o1 allocated before2010-01-21 17:34:48
 that also means that o2+ are not allocated as well2010-01-21 17:34:58
<pmcontext> yes2010-01-21 17:35:09
<Shelwien> so, the simpler implementation2010-01-21 17:35:22
<pmcontext> enwik8 36,645,7432010-01-21 17:35:23
<Shelwien> would be to store pointers to higher order with each symbol2010-01-21 17:35:56
<pmcontext> u mean like a tree ?2010-01-21 17:36:25
<Shelwien> of course :)2010-01-21 17:36:37
 i just tried to explain that tree appears automatically here2010-01-21 17:37:02
 when we try to simplify the algorithm :)2010-01-21 17:37:24
<pmcontext> but for each symbol there can be 256 child ? o.o how i point to so many2010-01-21 17:37:45
<Shelwien> what's the problem?2010-01-21 17:38:05
<pmcontext> i mean 256 different symbols can folow each symbol2010-01-21 17:38:12
<Shelwien> you have a basic context structure2010-01-21 17:38:26
<pmcontext> oh2010-01-21 17:38:40
<Shelwien> with 256 freqs... or 255 bitwise counters, don't matter2010-01-21 17:38:45
 and i suggest to also add 256 pointers there2010-01-21 17:38:59
<pmcontext> i see2010-01-21 17:39:43
<Shelwien> so if you have access to statistics for symbol "a" in context "xxx"2010-01-21 17:39:47
 that automatically means that you have a pointer to whole table for context "xxxa"2010-01-21 17:40:04
 and if that pointer is null, you just have to allocate a new context struct2010-01-21 17:40:30
 and store its address into the pointer2010-01-21 17:40:39
<pmcontext> i see2010-01-21 17:41:00
<Shelwien> btw stalker2010-01-21 17:41:14
 that weird post was about representing some long numbers with chain fractions or something2010-01-21 17:41:43
 the idea was to take 100 bits of data and compute some coefs for a formula, which would output the same value2010-01-21 17:42:33
 and well, that's completely unrelated to compression2010-01-21 17:43:02
<STalKer-X> so instead compressing the data, you compress the coeffs2010-01-21 17:43:28
<Shelwien> yeah, but in general such transformation don't reduce the amount of information2010-01-21 17:43:54
 (i mean, unless they're lossy transformations)2010-01-21 17:44:18
 and for lossless compression its best to not perform any data transformation at all2010-01-21 17:44:49
 because any transformations make it much harder to use some original correlations2010-01-21 17:45:31
 a good example of that is BWT2010-01-21 17:46:00
 if you'd transform some utf16 chinese text with it2010-01-21 17:46:23
 the compression would be much worse than even LZ probably2010-01-21 17:47:14
<pmcontext> enwik8 25946148 with crazy hashtable mixer 2010-01-21 17:49:16
 very slow but2010-01-21 17:49:28
<Shelwien> well, i'm not that good with enwik results :)2010-01-21 17:49:50
<pmcontext> book1 2420052010-01-21 17:50:06
<Shelwien> now, that's not quite good for o32010-01-21 17:50:23
 should be 233-234k 2010-01-21 17:50:35
<pmcontext> oh2010-01-21 17:50:39
<Shelwien> here's some o3 for you: http://shelwien.googlepages.com/o3bfa.rar2010-01-21 17:51:11
<pmcontext> wait this is other program that i wrote , not same mixer2010-01-21 17:51:22
 i use fixed weight for 5 orders2010-01-21 17:51:49
<Shelwien> that's unrelated, i'm just saying what's good for o3 :)2010-01-21 17:51:53
<pmcontext> oh ok2010-01-21 17:52:00
<Shelwien> that coder there uses another approach to mixing2010-01-21 17:52:45
 BFA aka Bruteforce Adaptation :)2010-01-21 17:53:15
 it collects the codelengths for all parameter (like weight) values from a given set2010-01-21 17:53:43
<pmcontext> o.o 233k !! u implement very efficiently 2010-01-21 17:54:04
<Shelwien> and uses the value providing the shortest code for actual coding2010-01-21 17:54:08
 that's a program from 2002 :)2010-01-21 17:55:10
*** toffer has left the channel2010-01-21 18:15:30
*** scott___ has joined the channel2010-01-21 18:34:03
<pmcontext> i g2g bye and thanks2010-01-21 18:34:12
*** pmcontext has left the channel2010-01-21 18:34:40
<scott___> hope I did not scare him away2010-01-21 18:35:00
<STalKer-X> nope2010-01-21 18:35:59
<Shelwien> hi2010-01-21 18:36:47
 i checked comp.compression2010-01-21 18:37:17
<scott___> i may not reply quicikly but I have this thing on a tab I am in the mood to do some coding today2010-01-21 18:37:22
 hi2010-01-21 18:37:26
<Shelwien> and beside your posts, there was also that guy with huffman compression for database records2010-01-21 18:38:08
<scott___> I thought alot about DNA compression and I have some ideas to try out that will hopefully work pretty neat2010-01-21 18:38:17
 on here2010-01-21 18:38:31
 the guy on comp.compression2010-01-21 18:38:45
<Shelwien> anyway, thanks to him, i just remembered about the possibility of bitcoding more compact than huffman :)2010-01-21 18:39:53
<scott___> was his name glenn2010-01-21 18:40:45
<Shelwien> yeah2010-01-21 18:40:55
<scott___> was he just on what was his nick2010-01-21 18:41:05
<Shelwien> he didn't appear here if you mean that :)2010-01-21 18:41:44
 i just read his post in comp.compression and it reminded me about nonprefix coding2010-01-21 18:42:33
 anyway, food calls, so later :)2010-01-21 18:42:41
<scott___> oh I think comp.compression a step done from here so I think I post there different;y than here2010-01-21 18:43:10
 later then2010-01-21 18:43:34
<Shelwien> yesterday I found that finereader 10 finally has support for japanese in the main branch2010-01-21 19:14:52
 (it seemed to have it long before too, but it was only distributed with SDK or something - wasn't able to find it)2010-01-21 19:15:32
 but i guess it was to early to be happy about that :)2010-01-21 19:16:04
 i tried OCRing a book2010-01-21 19:16:09
 and it seems that its still bad for this type of symbols2010-01-21 19:17:17
 in the end, even though finereader has some very nice GUI features2010-01-21 19:18:05
 i guess readiris which is used before is still better for japanese, despite its very simplistic interface2010-01-21 19:19:22
<scott___> 1139718 on e.coli and not done yet2010-01-21 20:01:07
<Shelwien> i guess we'd need to find more genetic data for this2010-01-21 20:02:33
 i mean, e.coli compressor is an even more weird thing than book1 compressor :)2010-01-21 20:03:07
<scott___> well nature is making its own rules2010-01-21 20:03:26
*** toffer has joined the channel2010-01-21 21:00:42
<toffer> hi2010-01-21 21:02:06
<Shelwien> hi2010-01-21 21:16:05
 pmcontext expired2010-01-21 21:16:17
 matt added an entry for acb: http://mattmahoney.net/dc/text.html#21852010-01-21 21:22:08