*** STalKer-X has joined the channel | 2010-01-21 04:24:57 |
*** STalKer-Y has left the channel | 2010-01-21 04:24:58 |
*** scott___ has joined the channel | 2010-01-21 04:25:16 |
<scott___> | hi :) | 2010-01-21 04:26:28 |
| its not much but its a start to pack acgt files binary bijective and back http://bijective.dogma.net/dna.zip | 2010-01-21 04:28:23 |
| later | 2010-01-21 04:28:43 |
*** scott___ has left the channel | 2010-01-21 04:28:52 |
*** pinc has joined the channel | 2010-01-21 07:33:13 |
<STalKer-X> | hey | 2010-01-21 12:25:47 |
| Shelwien: what do you think about http://encode.dreamhosters.com/showthread.php?p=10807#post10807 ? | 2010-01-21 12:25:57 |
*** toffer has joined the channel | 2010-01-21 12:28:18 |
<toffer> | hi | 2010-01-21 12:29:02 |
<STalKer-X> | heh | 2010-01-21 12:37:58 |
<toffer> | any news in here | 2010-01-21 12:49:45 |
| guess not that much | 2010-01-21 12:49:49 |
<STalKer-X> | just asked Shelwien about http://encode.dreamhosters.com/showthread.php?p=10807#post10807 :) | 2010-01-21 12:54:41 |
| but he is still sleeping | 2010-01-21 12:54:48 |
<toffer> | that looks rather esotheric to me | 2010-01-21 13:00:13 |
| i didn't pay much attention though | 2010-01-21 13:00:25 |
<STalKer-X> | heh | 2010-01-21 13:00:47 |
| maybe if someone actually tried it o_o | 2010-01-21 13:01:15 |
<toffer> | i'd really say i won't bother to look at that explaination at the website | 2010-01-21 13:06:33 |
| i might be biased but that doesn't look interesting to me | 2010-01-21 13:07:05 |
| are you writing any compressors btw? | 2010-01-21 13:26:06 |
<STalKer-X> | none | 2010-01-21 13:26:24 |
<toffer> | just lurking around - or why are you here? | 2010-01-21 13:32:20 |
<STalKer-X> | curiosity | 2010-01-21 13:54:31 |
<toffer> | if you're courious why not just try to program something? | 2010-01-21 13:57:17 |
<STalKer-X> | i cannot program :( | 2010-01-21 13:59:54 |
<toffer> | you can still try to learn | 2010-01-21 14:00:41 |
| what's your education despite that? | 2010-01-21 14:00:50 |
<STalKer-X> | student | 2010-01-21 14:00:58 |
<toffer> | and what're you studying? | 2010-01-21 14:01:30 |
<STalKer-X> | earth science | 2010-01-21 14:01:53 |
<toffer> | erm? | 2010-01-21 14:02:27 |
| what is that | 2010-01-21 14:03:38 |
<STalKer-X> | never heard of it? :o | 2010-01-21 14:04:47 |
<toffer> | nope | 2010-01-21 14:04:50 |
<STalKer-X> | geologie, geowissenschaften, geotechnologie :) | 2010-01-21 14:06:55 |
<toffer> | dann sag doch gleich geologie | 2010-01-21 14:07:36 |
| darunter kann ich mir ehr was vorstellen | 2010-01-21 14:07:43 |
<STalKer-X> | heisst auf englisch eben earth science :p | 2010-01-21 14:08:06 |
<toffer> | übrigens ist dimitry shkarin (ppmonstr&co) auch geologe - | 2010-01-21 14:08:09 |
| ich kenn nur "geology" | 2010-01-21 14:08:39 |
| aber kein "earth science" | 2010-01-21 14:08:48 |
| und wo studierst du? | 2010-01-21 14:09:43 |
| away for coffee | 2010-01-21 14:15:25 |
<STalKer-X> | tu berlin ;) | 2010-01-21 14:19:36 |
*** pmcontext has joined the channel | 2010-01-21 15:11:29 |
<pmcontext> | Hi | 2010-01-21 15:12:01 |
| hi toffer u there | 2010-01-21 15:15:03 |
<toffer> | hi | 2010-01-21 15:51:55 |
| now, yes | 2010-01-21 15:51:57 |
<pmcontext> | wb | 2010-01-21 15:52:03 |
| any luck with derivation :P | 2010-01-21 15:52:41 |
<toffer> | i didn't continue | 2010-01-21 15:53:02 |
| the first time i derived it it took quite some time | 2010-01-21 15:53:24 |
| i was supposed to use some tricks and reorderings | 2010-01-21 15:53:48 |
<pmcontext> | what do we do then o.o | 2010-01-21 15:54:02 |
<toffer> | but you can still use a different approach for updating your linear weights | 2010-01-21 15:54:06 |
| i mean iterative, numeric minimization | 2010-01-21 15:54:24 |
<pmcontext> | ok how does it work master | 2010-01-21 15:55:03 |
<toffer> | the most simple technique would be gradient descent | 2010-01-21 15:55:25 |
| you're familiar with that? | 2010-01-21 15:55:33 |
<pmcontext> | i heard about it a few times | 2010-01-21 15:56:12 |
<toffer> | just have a look at http://en.wikipedia.org/wiki/Gradient_descent to get a basic understanding | 2010-01-21 15:56:43 |
<pmcontext> | ok | 2010-01-21 15:56:50 |
<toffer> | mirroring the counter cost function we want to minimize H(w) | 2010-01-21 16:00:45 |
<pmcontext> | i see | 2010-01-21 16:01:44 |
| wat is H(w) | 2010-01-21 16:01:56 |
<toffer> | H(w) = -sum i=1..k d_i ( y_i ln p_i + (1-y_i) ln (1-p_i) ), with d_i=c^(i-1), p_i = (1-w) p_0i + w p_1i | 2010-01-21 16:02:45 |
| and with gradient descent | 2010-01-21 16:03:20 |
| you do it like that | 2010-01-21 16:03:23 |
| initially you set w=w0 (i.e. w0=0.5) | 2010-01-21 16:03:38 |
<pmcontext> | ok w0 is 0.5 | 2010-01-21 16:04:01 |
<toffer> | than you calculate w' = w - L dH(w)/dw | 2010-01-21 16:04:27 |
| where L is some small constant | 2010-01-21 16:04:32 |
<pmcontext> | i see so l is controls the amount of change in w ? | 2010-01-21 16:04:59 |
| L* | 2010-01-21 16:05:06 |
<toffer> | not exactly | 2010-01-21 16:05:13 |
| it is some damping constant | 2010-01-21 16:05:20 |
<pmcontext> | oh i get | 2010-01-21 16:05:30 |
<toffer> | normally L would be computed in each iteration | 2010-01-21 16:05:31 |
| but using the exact method you'd get unacceptable speeds | 2010-01-21 16:06:00 |
| even just with a single mixer | 2010-01-21 16:06:10 |
<pmcontext> | very slow if we iterate i guess | 2010-01-21 16:06:16 |
<toffer> | thus it (has to be) enough to do just one iteration | 2010-01-21 16:06:42 |
| it'd be best if you write it down on a paper and find dH/dw | 2010-01-21 16:07:19 |
<pmcontext> | ok | 2010-01-21 16:07:57 |
| is there any rule about derivation of a summation | 2010-01-21 16:08:37 |
<toffer> | well... d/dt(x + y) = dx/dt + dy/dt | 2010-01-21 16:09:12 |
| for finite sums you can interchange the order of summation/derivation | 2010-01-21 16:09:36 |
| actually you should face a problem right after finding the derivative (w/o simplifications) | 2010-01-21 16:10:13 |
<pmcontext> | sure i can face problem even before ^_^ | 2010-01-21 16:10:53 |
<toffer> | dH/dw = - sum i=1..k d_i dp_i/dw ( y_i/p_i - (1-y_i)/(1-p_i ) ) | 2010-01-21 16:12:13 |
| got that? | 2010-01-21 16:12:33 |
<pmcontext> | yes i got the ( y_i/p_i - (1-y_i)/(1-p_i ) ) , but missed out dp_i/dw | 2010-01-21 16:12:56 |
<toffer> | question: is the gradient linear in w? | 2010-01-21 16:13:22 |
<pmcontext> | hmm not sure o.o | 2010-01-21 16:14:44 |
<toffer> | ( y_i/p_i - (1-y_i)/(1-p_i ) ) where is w in there? | 2010-01-21 16:14:57 |
<pmcontext> | there isnt | 2010-01-21 16:15:06 |
<toffer> | there is | 2010-01-21 16:15:24 |
| p_i = (1-w) p_0i + w p_1i -> the mixed probability | 2010-01-21 16:15:36 |
| now again - is dH/dw linear in w? | 2010-01-21 16:16:05 |
<pmcontext> | one sec i will check my book | 2010-01-21 16:18:23 |
<toffer> | you can actually see it | 2010-01-21 16:18:36 |
| for example there is the expression y_i/p_i | 2010-01-21 16:18:46 |
| = y_i/( (1-w) p_0i + w p_1i) thus that expression is ~ 1/w | 2010-01-21 16:19:09 |
| thus the gradient is nonlinear in w | 2010-01-21 16:19:17 |
| as a consequence you cannot easily compute the exact gradient at each step | 2010-01-21 16:19:37 |
| that's the problem i meant | 2010-01-21 16:19:44 |
<pmcontext> | o.o so p is formed by w , im not sure wat is meant by linear in w | 2010-01-21 16:20:10 |
<toffer> | well a linear function | 2010-01-21 16:20:20 |
| f(x) = ax + b is linear in x | 2010-01-21 16:20:26 |
<pmcontext> | coz there is no higher powers of x ? | 2010-01-21 16:20:39 |
<toffer> | f(x) = 1/(ax + b) is nonlinear in x | 2010-01-21 16:20:39 |
<pmcontext> | i see | 2010-01-21 16:20:55 |
<toffer> | ok | 2010-01-21 16:21:01 |
<pmcontext> | ok i understand little | 2010-01-21 16:21:15 |
<toffer> | and a further simplification one assumes that the weight was optimal in each step prior the the current, thus the sum in dH/dw vanishes | 2010-01-21 16:21:16 |
| but did you understand that you cannot easily compute dH/dw | 2010-01-21 16:21:39 |
| since it's nonlinear in w | 2010-01-21 16:21:46 |
<pmcontext> | yes it is due to the 1/w | 2010-01-21 16:22:07 |
| and the summation | 2010-01-21 16:22:24 |
<toffer> | as to linear - a function f is linear in x, when it can be formulated like f(x)=ax+b | 2010-01-21 16:22:32 |
| dH/dw = - sum i=1..k d_i dp_i/dw ( y_i/p_i - (1-y_i)/(1-p_i ) ) | 2010-01-21 16:22:41 |
| that's the gradient | 2010-01-21 16:22:46 |
<pmcontext> | so we remove summation to make it simpler ? | 2010-01-21 16:23:04 |
<toffer> | now suppose that in previous steps dH/dw for i>1 the terms vanish. that means you suppose that the weight w was optimal in previous iterations | 2010-01-21 16:23:37 |
| yes | 2010-01-21 16:23:40 |
| otherwise you cannot compute it in an acceptable time | 2010-01-21 16:23:55 |
<pmcontext> | i see | 2010-01-21 16:24:15 |
<toffer> | thus you approximate | 2010-01-21 16:24:22 |
| dH/dw = d_1 dp_1/dw ( y_1/p_1 - (1-y_1)/(1-p_1) | 2010-01-21 16:24:54 |
| d_1 = c^(1-1) = 1 | 2010-01-21 16:25:04 |
| and p_1 = (1-w) p_01 + w p_11 | 2010-01-21 16:25:30 |
| = (p_11-p_01) w + p_01 = dp_1/dw w + p_01 | 2010-01-21 16:26:06 |
<pmcontext> | ah this looks much better | 2010-01-21 16:26:22 |
<toffer> | still it contains a division | 2010-01-21 16:26:34 |
| which can make it slow | 2010-01-21 16:26:38 |
| maks | 2010-01-21 16:26:58 |
| makes | 2010-01-21 16:26:59 |
| but that can already be an acceptible solution | 2010-01-21 16:27:11 |
| it can be rewritten | 2010-01-21 16:27:52 |
<pmcontext> | in more simpler way ? | 2010-01-21 16:28:09 |
<toffer> | ( y_1/p_1 - (1-y_1)/(1-p_1) ) = (y_i-p_i)/(p_i (1-p_i) ) | 2010-01-21 16:29:06 |
| y_i - p_i is the prediction error | 2010-01-21 16:29:14 |
| for example | 2010-01-21 16:29:19 |
| the mixer predicted p_i=0.8 | 2010-01-21 16:29:27 |
| and y_i was 1 | 2010-01-21 16:29:33 |
| the error is 0.2 | 2010-01-21 16:29:36 |
<pmcontext> | oh i see | 2010-01-21 16:29:45 |
<toffer> | now suppose the following | 2010-01-21 16:29:47 |
| we define a function g(x) = 1/(x*(1-x)) for x in (0, 1) | 2010-01-21 16:30:15 |
<pmcontext> | y_i is the encoded bit right ? | 2010-01-21 16:30:19 |
<toffer> | yes | 2010-01-21 16:30:22 |
| whops, i mixed up indices | 2010-01-21 16:30:32 |
| since it's the current iteration i=1 | 2010-01-21 16:30:42 |
| or you can simply drop inidices | 2010-01-21 16:30:47 |
<pmcontext> | oh ok | 2010-01-21 16:30:51 |
<toffer> | question: what do you do with g(x) ? | 2010-01-21 16:31:01 |
| to simply processing | 2010-01-21 16:31:06 |
<pmcontext> | but it is 1/0 for x=0 or 1 | 2010-01-21 16:31:55 |
<toffer> | that's why i've written x in (0, 1) (exclusive interval) | 2010-01-21 16:32:20 |
<pmcontext> | ohhh i see | 2010-01-21 16:32:29 |
*** pinc has left the channel | 2010-01-21 16:32:39 |
<toffer> | (0, 1) = [0, 1] / {0} / {1} | 2010-01-21 16:32:40 |
| again | 2010-01-21 16:32:51 |
| what do you use g(x) for | 2010-01-21 16:32:54 |
<pmcontext> | as a damper ? | 2010-01-21 16:33:16 |
<toffer> | no | 2010-01-21 16:33:26 |
| let me rewrite it again | 2010-01-21 16:33:29 |
<pmcontext> | oh i think g(x) give a big value for x (0,1) | 2010-01-21 16:34:04 |
<toffer> | dH/dw = dp/dw (y-p) 1/(p (1-p)) | 2010-01-21 16:34:08 |
| dH/dw = dp/dw (y-p) g(p) | 2010-01-21 16:34:35 |
<pmcontext> | i see | 2010-01-21 16:34:44 |
<toffer> | just to eliminate the division | 2010-01-21 16:34:56 |
| since it gonna be slow | 2010-01-21 16:34:59 |
<pmcontext> | but what is g(p) meaning | 2010-01-21 16:35:07 |
<toffer> | oh i forgot | 2010-01-21 16:35:08 |
| it's just a term appearing in the simplyfied form of dH/dw | 2010-01-21 16:35:22 |
<pmcontext> | i understnad y-p is prediction error , but g(p) o.o | 2010-01-21 16:35:38 |
<toffer> | dH/dw = dp/dw [y/p - (1-y)/(1-p)] | 2010-01-21 16:36:04 |
| and | 2010-01-21 16:36:08 |
| y/p - (1-y)/(1-p) = (y-p)/(p(1-p)) | 2010-01-21 16:36:29 |
| just calculate it yourself | 2010-01-21 16:36:39 |
| that y-p would be there if you'd minimize the rms (y-p)^2 | 2010-01-21 16:37:02 |
| but were're supposed to do compression, thus minimizing the entropy | 2010-01-21 16:37:18 |
<pmcontext> | afte simplify we get (y-p) 1/(p (1-p)) , and g(p) = 1/(p(1-p)) | 2010-01-21 16:37:32 |
<toffer> | yes | 2010-01-21 16:37:39 |
<pmcontext> | i see | 2010-01-21 16:37:57 |
<toffer> | and that term g(p) rescales the quadratic metric into an entropy metric - but this would be a very *bloomy* interpretation | 2010-01-21 16:38:30 |
| again | 2010-01-21 16:38:52 |
| how does dH/dw look like alltogether? | 2010-01-21 16:39:02 |
<pmcontext> | scarry dp/dw [y/p - (1-y)/(1-p)] | 2010-01-21 16:39:18 |
<toffer> | and rewritten with g(p) ? | 2010-01-21 16:39:35 |
| and expanding dp/dw | 2010-01-21 16:39:51 |
<pmcontext> | after try to simplify dp/dw (y-p)g(p) | 2010-01-21 16:39:56 |
<toffer> | dH/dw = (p1-p0) (y-p) 1/(p(1-p)) = (p1-p0) (y-p) g(p) | 2010-01-21 16:40:44 |
| with p=(p1-p0) w + p0 = (1-w) p0 + w p1 | 2010-01-21 16:41:09 |
| now there's another issue | 2010-01-21 16:41:27 |
| since we use limited precision | 2010-01-21 16:41:36 |
| and numerica stability has to be guaranteed | 2010-01-21 16:41:44 |
<pmcontext> | u mean it can go crazy ? | 2010-01-21 16:42:09 |
<toffer> | yes | 2010-01-21 16:42:15 |
| you should only adjust the weight if g(p) is in a limited range | 2010-01-21 16:42:19 |
| and check its bounds | 2010-01-21 16:42:23 |
<pmcontext> | i see | 2010-01-21 16:42:40 |
<toffer> | 1. bounds | 2010-01-21 16:42:49 |
| w' = w + dH/dw | 2010-01-21 16:42:59 |
| and since w actually is a probability | 2010-01-21 16:43:08 |
| it has to be limited in [0, 1] | 2010-01-21 16:43:17 |
<pmcontext> | yes it should be in some fixed range | 2010-01-21 16:43:20 |
| can i use [0, 64] ? | 2010-01-21 16:43:31 |
<toffer> | thus w' = max(0, min(1, w+dH/dw) ) | 2010-01-21 16:43:32 |
| if you like to work with 6 bit precision, yes | 2010-01-21 16:43:43 |
| but you can use more | 2010-01-21 16:43:48 |
| and that'd help greatly | 2010-01-21 16:43:53 |
<pmcontext> | oh ok more | 2010-01-21 16:43:53 |
<toffer> | i typically use 12..15 bits | 2010-01-21 16:44:03 |
| and secondly | 2010-01-21 16:44:14 |
<pmcontext> | ok | 2010-01-21 16:44:20 |
<toffer> | if p is near 0 | 2010-01-21 16:44:22 |
| or near 1 | 2010-01-21 16:44:25 |
| you won't want to adjust weihts | 2010-01-21 16:44:30 |
| and "near" can be classified by some threshold t | 2010-01-21 16:44:44 |
| thus | 2010-01-21 16:44:46 |
| if p<t or p>=1-t don'T do anything | 2010-01-21 16:44:58 |
| and to make things more efficient | 2010-01-21 16:45:05 |
| you can say | 2010-01-21 16:45:13 |
| w' = w + dH/dw -> becomes w' = w for the given condition | 2010-01-21 16:45:34 |
| to avoid comparsions | 2010-01-21 16:45:47 |
| you set g(p) = 0 for p<t or p>=t | 2010-01-21 16:46:00 |
<pmcontext> | oh | 2010-01-21 16:46:08 |
| so that makes large change to w if i update | 2010-01-21 16:46:21 |
<toffer> | thus dH/dw = dH/dw = (p1-p0) (y-p) g(p) = (p1-p0) (y-p) * 0 = 0 | 2010-01-21 16:46:27 |
| you avoid large changes | 2010-01-21 16:46:36 |
<pmcontext> | oh i see | 2010-01-21 16:46:41 |
<toffer> | so what you're now supposed to do | 2010-01-21 16:46:50 |
| is to write a mixer class | 2010-01-21 16:46:58 |
| and a function class | 2010-01-21 16:47:08 |
| the function class should be a lookup for g(p) | 2010-01-21 16:47:17 |
<pmcontext> | g(p) is pre calculated ? | 2010-01-21 16:47:41 |
<toffer> | yes | 2010-01-21 16:47:44 |
<pmcontext> | ok | 2010-01-21 16:47:54 |
<toffer> | oh | 2010-01-21 16:47:59 |
| i forgot | 2010-01-21 16:48:01 |
| w' = w + L dH/dw | 2010-01-21 16:48:08 |
<pmcontext> | damping o.o | 2010-01-21 16:48:14 |
<toffer> | yes | 2010-01-21 16:48:23 |
<pmcontext> | but isnt float bad ? | 2010-01-21 16:48:38 |
<toffer> | that L can be simplified into something like L=2^-B | 2010-01-21 16:48:40 |
| thus you can simply write dH/dw >> B | 2010-01-21 16:48:51 |
<pmcontext> | oh | 2010-01-21 16:48:58 |
| ok | 2010-01-21 16:49:03 |
<toffer> | that isn't float based | 2010-01-21 16:49:23 |
| you can use fixed point math | 2010-01-21 16:49:26 |
<pmcontext> | ok i und it is power of 2 | 2010-01-21 16:49:34 |
| so i can shift | 2010-01-21 16:49:44 |
| to divide | 2010-01-21 16:49:50 |
<toffer> | like [0, 1) maps to [0, 4095) | 2010-01-21 16:49:51 |
| right | 2010-01-21 16:50:03 |
| erm | 2010-01-21 16:50:08 |
| like [0, 1) maps to [0, 4096) | 2010-01-21 16:50:17 |
<pmcontext> | ok [0,4095] | 2010-01-21 16:50:32 |
<toffer> | note that the implementation is sensitive to values the assignment of precisions and to the value of L | 2010-01-21 16:51:04 |
| selecting wrong values can cause some trouble | 2010-01-21 16:51:16 |
| than there's another weighting scheme which can be derived in a similar manner | 2010-01-21 16:51:45 |
| but i guess that's enough for today | 2010-01-21 16:52:04 |
| ok? | 2010-01-21 16:52:11 |
<pmcontext> | yes | 2010-01-21 16:52:13 |
| dH/dw = (p1-p0) (y-p) g(p) and g(p) = 1/(p(1-p)) | 2010-01-21 16:52:18 |
| w' = w + L dH/dw | 2010-01-21 16:52:42 |
| w' needs bounds check | 2010-01-21 16:53:07 |
| and g(p) = 0 if p < t1 or p > t2 near the corner | 2010-01-21 16:53:45 |
| L is power of 2 to avoid division | 2010-01-21 16:54:09 |
<toffer> | right | 2010-01-21 16:54:11 |
| but now please double check if there's no sign error | 2010-01-21 16:54:22 |
| i might have missed a sign while writing | 2010-01-21 16:54:32 |
| (in dH/dw) | 2010-01-21 16:55:03 |
| so happy programming ^^ | 2010-01-21 16:55:18 |
<pmcontext> | o.o thank you | 2010-01-21 16:55:23 |
<toffer> | the sign error can blow up everything, if any | 2010-01-21 16:55:37 |
<pmcontext> | i will figure out the sign | 2010-01-21 16:55:44 |
<toffer> | since it'd mean it maximimizes the entropy instead of minimizing | 2010-01-21 16:55:53 |
<pmcontext> | yes i und | 2010-01-21 16:56:00 |
| i will check value of w for each update | 2010-01-21 16:56:13 |
| and its L dH/dw | 2010-01-21 16:56:30 |
| i save the chat ^^ | 2010-01-21 16:57:51 |
| for reference | 2010-01-21 16:58:08 |
<toffer> | good idea | 2010-01-21 16:59:56 |
| and as i said in ordinary minimization problems L would be calculated with some similar scheme every step. but that's unacceptably slow in our case | 2010-01-21 17:00:41 |
| there's another way to include an L which is not a power of two, btw | 2010-01-21 17:01:21 |
| you set | 2010-01-21 17:01:23 |
| w' = w + d | 2010-01-21 17:01:29 |
| d = L dH/dw | 2010-01-21 17:01:39 |
| d = L (p1-p0) (y-p) g(p) = (p1-p0) (y-p) (L g(p)) and you set L g(p) = h(p) or something like that | 2010-01-21 17:02:24 |
<pmcontext> | oh | 2010-01-21 17:09:12 |
| but we make lookup only for g(p) ? | 2010-01-21 17:10:03 |
| or L*g(p) | 2010-01-21 17:10:12 |
<toffer> | you can do that as you like | 2010-01-21 17:10:53 |
| i'd say just try g(p) | 2010-01-21 17:11:01 |
<pmcontext> | ok | 2010-01-21 17:11:22 |
<toffer> | for simplicity | 2010-01-21 17:11:32 |
<Shelwien> | hi | 2010-01-21 17:11:42 |
| are you implementing logistic mixing there or what? :) | 2010-01-21 17:11:52 |
<pmcontext> | o.o no logistic , coz i duno it yet | 2010-01-21 17:13:47 |
<toffer> | linear mixing with gradient descent optimization | 2010-01-21 17:14:02 |
<pmcontext> | that is meaning of logistic ? | 2010-01-21 17:14:22 |
<Shelwien> | no | 2010-01-21 17:14:28 |
<toffer> | i was just answering shelwien | 2010-01-21 17:14:48 |
<Shelwien> | but update is kinda similar for logistic | 2010-01-21 17:14:50 |
<toffer> | shelwiens question | 2010-01-21 17:14:53 |
| yes | 2010-01-21 17:14:56 |
| sicne it's gradient based | 2010-01-21 17:15:00 |
| and there is some d ln(...) / dw involved | 2010-01-21 17:15:13 |
| that's why it looks similar | 2010-01-21 17:15:17 |
<Shelwien> | anyway, i'd suggest to try something simpler first | 2010-01-21 17:15:19 |
| like finding the context with the best prediction | 2010-01-21 17:15:55 |
| ( max(bit?p:1-p) ) | 2010-01-21 17:16:10 |
| and incrementing its freq | 2010-01-21 17:16:26 |
<pmcontext> | i did this and with my o3+o2+o1 | 2010-01-21 17:16:44 |
<Shelwien> | then using these freqs as weights | 2010-01-21 17:16:45 |
| and? | 2010-01-21 17:16:57 |
<pmcontext> | it is a bit slow but book1 276016 | 2010-01-21 17:17:55 |
<toffer> | with my genetic optimizer i found that two optimal context for book1 are o2 and o5 | 2010-01-21 17:18:24 |
| to be specific | 2010-01-21 17:18:27 |
| 0x405fff and (1<<40)-1 | 2010-01-21 17:18:38 |
*** Shelwien has left the channel | 2010-01-21 17:18:42 |
*** Shelwien has joined the channel | 2010-01-21 17:18:45 |
<pmcontext> | also the weight was limited to [0 to 500] | 2010-01-21 17:18:51 |
| book1 276016 book2 193998 | 2010-01-21 17:19:31 |
<toffer> | it'd be good if you'd benchmark on enwik7, too | 2010-01-21 17:19:53 |
<Shelwien> | well, first, we need to clear up this thing about orders | 2010-01-21 17:20:10 |
| as i said before, t[c0][ctx] | 2010-01-21 17:20:49 |
<toffer> | i'm downtown now | 2010-01-21 17:20:57 |
| to have some beer | 2010-01-21 17:21:02 |
<Shelwien> | with ctx being the bit context in the byte | 2010-01-21 17:21:07 |
<toffer> | gonna be back in 4 hours or so | 2010-01-21 17:21:08 |
| good luck | 2010-01-21 17:21:12 |
<Shelwien> | same to you :) | 2010-01-21 17:21:19 |
| so, despite being 2D, that's order0 | 2010-01-21 17:22:25 |
| so, the question is what you call "o3+o2+o1" there | 2010-01-21 17:22:51 |
<pmcontext> | well i think i type wrong | 2010-01-21 17:23:13 |
<Shelwien> | if its o0+o1+o2 actually, then 276k is ok in fact | 2010-01-21 17:23:16 |
<pmcontext> | o2 + o1 + o0 | 2010-01-21 17:23:22 |
<Shelwien> | then, by adding real o3 there, you should get around 233k | 2010-01-21 17:23:54 |
<pmcontext> | i tried but it seem to use up much memory and stops | 2010-01-21 17:24:59 |
<Shelwien> | well, you need to use some better data structure | 2010-01-21 17:25:16 |
| did you try STL map at least? | 2010-01-21 17:25:35 |
<pmcontext> | oh i used an array | 2010-01-21 17:25:48 |
<Shelwien> | for o3? thats crazy | 2010-01-21 17:25:57 |
<pmcontext> | yes i realise now >< | 2010-01-21 17:26:04 |
| how can i use less memory ? | 2010-01-21 17:26:18 |
<Shelwien> | though you can deal with arrays too, to some extent | 2010-01-21 17:26:20 |
| for example, lets consider o1 | 2010-01-21 17:26:30 |
| its 256 probability distribution tables, one for each prefix byte value | 2010-01-21 17:26:53 |
| so, if instead of a single o1[256][256] | 2010-01-21 17:27:10 |
| you'd make p2[256] with pointers to tables for single contexts | 2010-01-21 17:28:00 |
| and allocate these dynamically | 2010-01-21 17:28:08 |
| then you'd use much less memory even for o1 | 2010-01-21 17:28:19 |
<pmcontext> | i see | 2010-01-21 17:28:28 |
<Shelwien> | as normally not all byte values are present in text | 2010-01-21 17:28:33 |
<pmcontext> | i allocate only after i see a byte ? | 2010-01-21 17:29:03 |
<Shelwien> | then, in a similar way, you can make tables of pointers to pointers, for o2, etc | 2010-01-21 17:29:08 |
| yes | 2010-01-21 17:29:16 |
| i'd say it differently though | 2010-01-21 17:29:28 |
| if you see a null context pointer value in the table | 2010-01-21 17:29:47 |
| you allocate and init that table | 2010-01-21 17:30:08 |
<pmcontext> | ok , and to encode that bit i dont use this order coz it empty ? | 2010-01-21 17:31:16 |
<Shelwien> | you can still init and use it after that | 2010-01-21 17:31:32 |
<pmcontext> | oh ok | 2010-01-21 17:31:48 |
<Shelwien> | though of course its better to skip it at this point | 2010-01-21 17:31:52 |
<pmcontext> | we mix the orders which have tables right? | 2010-01-21 17:32:16 |
<Shelwien> | as i said, we also can always mix all orders, if you need simplicity | 2010-01-21 17:32:41 |
| after making sure that all the stats are allocated and initialized | 2010-01-21 17:32:59 |
| ... | 2010-01-21 17:33:22 |
<pmcontext> | ok | 2010-01-21 17:33:57 |
<Shelwien> | also, i'd suggest to consider some similarity there | 2010-01-21 17:34:14 |
| for example, if you don't have current o1 allocated before | 2010-01-21 17:34:48 |
| that also means that o2+ are not allocated as well | 2010-01-21 17:34:58 |
<pmcontext> | yes | 2010-01-21 17:35:09 |
<Shelwien> | so, the simpler implementation | 2010-01-21 17:35:22 |
<pmcontext> | enwik8 36,645,743 | 2010-01-21 17:35:23 |
<Shelwien> | would be to store pointers to higher order with each symbol | 2010-01-21 17:35:56 |
<pmcontext> | u mean like a tree ? | 2010-01-21 17:36:25 |
<Shelwien> | of course :) | 2010-01-21 17:36:37 |
| i just tried to explain that tree appears automatically here | 2010-01-21 17:37:02 |
| when we try to simplify the algorithm :) | 2010-01-21 17:37:24 |
<pmcontext> | but for each symbol there can be 256 child ? o.o how i point to so many | 2010-01-21 17:37:45 |
<Shelwien> | what's the problem? | 2010-01-21 17:38:05 |
<pmcontext> | i mean 256 different symbols can folow each symbol | 2010-01-21 17:38:12 |
<Shelwien> | you have a basic context structure | 2010-01-21 17:38:26 |
<pmcontext> | oh | 2010-01-21 17:38:40 |
<Shelwien> | with 256 freqs... or 255 bitwise counters, don't matter | 2010-01-21 17:38:45 |
| and i suggest to also add 256 pointers there | 2010-01-21 17:38:59 |
<pmcontext> | i see | 2010-01-21 17:39:43 |
<Shelwien> | so if you have access to statistics for symbol "a" in context "xxx" | 2010-01-21 17:39:47 |
| that automatically means that you have a pointer to whole table for context "xxxa" | 2010-01-21 17:40:04 |
| and if that pointer is null, you just have to allocate a new context struct | 2010-01-21 17:40:30 |
| and store its address into the pointer | 2010-01-21 17:40:39 |
<pmcontext> | i see | 2010-01-21 17:41:00 |
<Shelwien> | btw stalker | 2010-01-21 17:41:14 |
| that weird post was about representing some long numbers with chain fractions or something | 2010-01-21 17:41:43 |
| the idea was to take 100 bits of data and compute some coefs for a formula, which would output the same value | 2010-01-21 17:42:33 |
| and well, that's completely unrelated to compression | 2010-01-21 17:43:02 |
<STalKer-X> | so instead compressing the data, you compress the coeffs | 2010-01-21 17:43:28 |
<Shelwien> | yeah, but in general such transformation don't reduce the amount of information | 2010-01-21 17:43:54 |
| (i mean, unless they're lossy transformations) | 2010-01-21 17:44:18 |
| and for lossless compression its best to not perform any data transformation at all | 2010-01-21 17:44:49 |
| because any transformations make it much harder to use some original correlations | 2010-01-21 17:45:31 |
| a good example of that is BWT | 2010-01-21 17:46:00 |
| if you'd transform some utf16 chinese text with it | 2010-01-21 17:46:23 |
| the compression would be much worse than even LZ probably | 2010-01-21 17:47:14 |
<pmcontext> | enwik8 25946148 with crazy hashtable mixer | 2010-01-21 17:49:16 |
| very slow but | 2010-01-21 17:49:28 |
<Shelwien> | well, i'm not that good with enwik results :) | 2010-01-21 17:49:50 |
<pmcontext> | book1 242005 | 2010-01-21 17:50:06 |
<Shelwien> | now, that's not quite good for o3 | 2010-01-21 17:50:23 |
| should be 233-234k | 2010-01-21 17:50:35 |
<pmcontext> | oh | 2010-01-21 17:50:39 |
<Shelwien> | here's some o3 for you: http://shelwien.googlepages.com/o3bfa.rar | 2010-01-21 17:51:11 |
<pmcontext> | wait this is other program that i wrote , not same mixer | 2010-01-21 17:51:22 |
| i use fixed weight for 5 orders | 2010-01-21 17:51:49 |
<Shelwien> | that's unrelated, i'm just saying what's good for o3 :) | 2010-01-21 17:51:53 |
<pmcontext> | oh ok | 2010-01-21 17:52:00 |
<Shelwien> | that coder there uses another approach to mixing | 2010-01-21 17:52:45 |
| BFA aka Bruteforce Adaptation :) | 2010-01-21 17:53:15 |
| it collects the codelengths for all parameter (like weight) values from a given set | 2010-01-21 17:53:43 |
<pmcontext> | o.o 233k !! u implement very efficiently | 2010-01-21 17:54:04 |
<Shelwien> | and uses the value providing the shortest code for actual coding | 2010-01-21 17:54:08 |
| that's a program from 2002 :) | 2010-01-21 17:55:10 |
*** toffer has left the channel | 2010-01-21 18:15:30 |
*** scott___ has joined the channel | 2010-01-21 18:34:03 |
<pmcontext> | i g2g bye and thanks | 2010-01-21 18:34:12 |
*** pmcontext has left the channel | 2010-01-21 18:34:40 |
<scott___> | hope I did not scare him away | 2010-01-21 18:35:00 |
<STalKer-X> | nope | 2010-01-21 18:35:59 |
<Shelwien> | hi | 2010-01-21 18:36:47 |
| i checked comp.compression | 2010-01-21 18:37:17 |
<scott___> | i may not reply quicikly but I have this thing on a tab I am in the mood to do some coding today | 2010-01-21 18:37:22 |
| hi | 2010-01-21 18:37:26 |
<Shelwien> | and beside your posts, there was also that guy with huffman compression for database records | 2010-01-21 18:38:08 |
<scott___> | I thought alot about DNA compression and I have some ideas to try out that will hopefully work pretty neat | 2010-01-21 18:38:17 |
| on here | 2010-01-21 18:38:31 |
| the guy on comp.compression | 2010-01-21 18:38:45 |
<Shelwien> | anyway, thanks to him, i just remembered about the possibility of bitcoding more compact than huffman :) | 2010-01-21 18:39:53 |
<scott___> | was his name glenn | 2010-01-21 18:40:45 |
<Shelwien> | yeah | 2010-01-21 18:40:55 |
<scott___> | was he just on what was his nick | 2010-01-21 18:41:05 |
<Shelwien> | he didn't appear here if you mean that :) | 2010-01-21 18:41:44 |
| i just read his post in comp.compression and it reminded me about nonprefix coding | 2010-01-21 18:42:33 |
| anyway, food calls, so later :) | 2010-01-21 18:42:41 |
<scott___> | oh I think comp.compression a step done from here so I think I post there different;y than here | 2010-01-21 18:43:10 |
| later then | 2010-01-21 18:43:34 |
<Shelwien> | yesterday I found that finereader 10 finally has support for japanese in the main branch | 2010-01-21 19:14:52 |
| (it seemed to have it long before too, but it was only distributed with SDK or something - wasn't able to find it) | 2010-01-21 19:15:32 |
| but i guess it was to early to be happy about that :) | 2010-01-21 19:16:04 |
| i tried OCRing a book | 2010-01-21 19:16:09 |
| and it seems that its still bad for this type of symbols | 2010-01-21 19:17:17 |
| in the end, even though finereader has some very nice GUI features | 2010-01-21 19:18:05 |
| i guess readiris which is used before is still better for japanese, despite its very simplistic interface | 2010-01-21 19:19:22 |
<scott___> | 1139718 on e.coli and not done yet | 2010-01-21 20:01:07 |
<Shelwien> | i guess we'd need to find more genetic data for this | 2010-01-21 20:02:33 |
| i mean, e.coli compressor is an even more weird thing than book1 compressor :) | 2010-01-21 20:03:07 |
<scott___> | well nature is making its own rules | 2010-01-21 20:03:26 |
*** toffer has joined the channel | 2010-01-21 21:00:42 |
<toffer> | hi | 2010-01-21 21:02:06 |
<Shelwien> | hi | 2010-01-21 21:16:05 |
| pmcontext expired | 2010-01-21 21:16:17 |
| matt added an entry for acb: http://mattmahoney.net/dc/text.html#2185 | 2010-01-21 21:22:08 |