*** complogger has joined the channel | 2009-09-28 10:15:18 |
*** compbooks has joined the channel | 2009-09-28 10:15:59 |
*** toffer has joined the channel | 2009-09-28 14:34:16 |
<toffer> | hi | 2009-09-28 14:34:29 |
*** pinc has left the channel | 2009-09-28 15:40:20 |
<Shelwien> | hi toffer | 2009-09-28 16:04:26 |
<toffer> | i recently found the term w* - w in the cost function for entropy optimal linear mixing, w* = (y-p0)/(p1-p0), p_mix = (p1-p0)*w + p0 | 2009-09-28 16:15:03 |
| and i improved my time series predictor | 2009-09-28 16:15:32 |
| my advisor at fraunhofer was surprised that i developed stuff similar to his approach (well the results of data analysis) for that | 2009-09-28 16:16:17 |
<Shelwien> | huh... | 2009-09-28 16:23:25 |
| its exactly how my linear mixer works... %) | 2009-09-28 16:23:37 |
<toffer> | there is a mistake | 2009-09-28 16:24:12 |
| i know | 2009-09-28 16:24:14 |
| it is "not y" in the nominator | 2009-09-28 16:24:25 |
| a single dp_mix/dw is 1/(w*-w) | 2009-09-28 16:24:53 |
| -1/(w*-w) | 2009-09-28 16:25:04 |
| but again that w*-w is in the denominator | 2009-09-28 16:25:21 |
| i just wanted to say that this does show similarities | 2009-09-28 16:25:42 |
| and again | 2009-09-28 16:27:31 |
| i forgot to insert the entropy | 2009-09-28 16:27:41 |
| to be exact | 2009-09-28 16:27:44 |
| y ln p_mix + (1-y) ln 1-p_mix | 2009-09-28 16:28:10 |
| -[ y ln p_mix + (1-y) ln 1-p_mix ] | 2009-09-28 16:28:22 |
| grml | 2009-09-28 16:28:24 |
| you know what i mean | 2009-09-28 16:28:29 |
| ^^ | 2009-09-28 16:28:29 |
| never made that much typos | 2009-09-28 16:28:36 |
| in such a short time | 2009-09-28 16:28:41 |
<Shelwien> | ... | 2009-09-28 16:29:10 |
<toffer> | i just found it interesting | 2009-09-28 16:29:23 |
<Shelwien> | ...well, as to me, i was recently thinking about mapping of statistics | 2009-09-28 16:30:47 |
| like, computing p.d for DCT coefficients by p.d of image pixels ;) | 2009-09-28 16:31:25 |
<toffer> | why not | 2009-09-28 16:32:30 |
| that baysian stuff is really good for optimization | 2009-09-28 16:32:55 |
<Shelwien> | well, it has many uses actually | 2009-09-28 16:33:02 |
| i think it should be applicable for your time series as well | 2009-09-28 16:33:14 |
<toffer> | i can hardly change anything now | 2009-09-28 16:33:25 |
| it'll be finished in 2 to 3 weeks | 2009-09-28 16:33:32 |
| afterwards i gonna correct some mistakes | 2009-09-28 16:33:40 |
| and get it printed | 2009-09-28 16:33:42 |
| guess what the optimal solution for an entropy optimal linear mixer is | 2009-09-28 16:35:14 |
| i just calculated it | 2009-09-28 16:35:23 |
| it's an average of these w* | 2009-09-28 16:35:42 |
| w_opt = (sum c_k w*_k) / (sum c_k) | 2009-09-28 16:38:35 |
| that's pretty much like a linear counter | 2009-09-28 16:38:46 |
| but y_k is replaced with w* | 2009-09-28 16:39:02 |
| w*_k | 2009-09-28 16:39:09 |
| in the stationary case k->oo it's like exponential decay of weights | 2009-09-28 16:39:56 |
| w*_k | 2009-09-28 16:40:03 |
<Shelwien> | whatever... it all depends on a practical implementation anyway | 2009-09-28 16:40:11 |
<toffer> | so my approximation and your counter backprop is all the same | 2009-09-28 16:40:19 |
| at least you now know that your scheme is optimal | 2009-09-28 16:40:51 |
| well implementation is step 2 right after knowing what to do | 2009-09-28 16:41:39 |
| did you use P(y=0) or P(y=1) | 2009-09-28 16:43:03 |
<Shelwien> | my models usually work with P(y=0) | 2009-09-28 16:45:45 |
<toffer> | well in the case w* is (y-p0)/(p1-p0) ;) | 2009-09-28 16:46:31 |
| that case | 2009-09-28 16:46:36 |
<Shelwien> | i guess ;) | 2009-09-28 16:46:52 |
| but why linear mixer suddenly? ;) | 2009-09-28 16:47:14 |
<toffer> | since a piecewise linear function can approximate anything | 2009-09-28 16:48:23 |
| as you know for linear mixing i looked up weights in a context of p0,p1 | 2009-09-28 16:48:39 |
| and nonlinear weighting can be emulated by making w a (linear) function of nonlinear transformations of p0,p1 | 2009-09-28 16:49:54 |
| thus i can mirror the behaviour of stretch/squash | 2009-09-28 16:50:10 |
| while still having the ability to get an optimal solution | 2009-09-28 16:50:24 |
<Shelwien> | %) | 2009-09-28 16:50:29 |
<toffer> | which cannot be calculated for paq style mixing | 2009-09-28 16:50:59 |
<Shelwien> | well, i'd like to somehow extend the the max likelihood approach (from logistic mixing) to some other areas | 2009-09-28 16:51:13 |
<toffer> | at least not in a closed form | 2009-09-28 16:51:17 |
<Shelwien> | but until now couldn't really think of anything | 2009-09-28 16:51:28 |
<toffer> | parameter estimation? | 2009-09-28 16:51:38 |
| why don't you make an ew optimizer | 2009-09-28 16:53:24 |
| new | 2009-09-28 16:53:26 |
| that basian stuff i read about simultaniously estimates: 1. cost function parameters and 2. search process parameters | 2009-09-28 16:54:02 |
| that sounded specific | 2009-09-28 16:54:15 |
| but was pretty much general | 2009-09-28 16:54:23 |
| and very powerful | 2009-09-28 16:54:26 |
<Shelwien> | no, i wasn't talking about optimization | 2009-09-28 16:58:46 |
| if you remember, there was that explanation of logistic mixing via string probabilities | 2009-09-28 16:59:24 |
<toffer> | for me it's still more an interpretation than explaination | 2009-09-28 16:59:57 |
<Shelwien> | so logistic mixer appeared to be an approximation of static switching of static estimations | 2009-09-28 16:59:58 |
| Matt clearly explained after that that it was derived purely ad hoc | 2009-09-28 17:00:55 |
| and my interpretation or whatever at least provides a "physical" background for it | 2009-09-28 17:01:57 |
| anyway, it looks fairly solid for me - with verification by alternative update formula and all | 2009-09-28 17:03:02 |
<toffer> | the intention of optimization is enough for me | 2009-09-28 17:03:10 |
| and the characteristic of the contained transform | 2009-09-28 17:03:19 |
<Shelwien> | so i'd like to apply a similar approach somewhere else | 2009-09-28 17:03:44 |
| for example, maybe we can somehow build a logistic mixer for p=~0.5 | 2009-09-28 17:04:41 |
| which would be more efficient than linear one ;) | 2009-09-28 17:04:53 |
<toffer> | near 0.5 logistic mixing behaves almost like linear mixing | 2009-09-28 17:05:38 |
<Shelwien> | yeah, but its worse | 2009-09-28 17:05:58 |
<toffer> | not really in that case | 2009-09-28 17:06:17 |
| but the logistic behaviour is very different near 0/1 | 2009-09-28 17:06:29 |
<Shelwien> | maybe only a little, but taking into account that linear mixing is also much simpler... | 2009-09-28 17:06:30 |
| as i mentioned, i tried it | 2009-09-28 17:06:46 |
<toffer> | especially for weighs far from 0/1 | 2009-09-28 17:06:46 |
<Shelwien> | was expecting a lot after mix_test experiments | 2009-09-28 17:07:07 |
| and tried to apply that logistic mix2 in my audio codec entropy model | 2009-09-28 17:07:39 |
| and it was significantly worse in the same circumstances (with only mixer parameters optimized) | 2009-09-28 17:08:33 |
<toffer> | logistic mixing just works well iff predictions near 0/1 provide a high confidence | 2009-09-28 17:08:42 |
<Shelwien> | well, as i said, my case is more about p=~0.5 | 2009-09-28 17:09:09 |
| like that zero flag model | 2009-09-28 17:09:20 |
| but i wonder whether that "explanation" can be adjusted somehow | 2009-09-28 17:10:35 |
| like, maybe single ~0.5 prediction can be split into a mix of 0/1 and 0.5 | 2009-09-28 17:10:59 |
<toffer> | the actual question is why the overall result is worse: since logistic mixing doesn't behave "linear" enough for 0.5 or since it behaves "logistic" for other values | 2009-09-28 17:11:20 |
<Shelwien> | actually it could be explained by another feature of my linear mixer | 2009-09-28 17:12:29 |
| it only updates the weight when distance between inputs is large enough | 2009-09-28 17:13:01 |
<toffer> | the essence is the same compared to my previous statement. | 2009-09-28 17:14:00 |
| since it depends on the model output | 2009-09-28 17:14:13 |
<Shelwien> | so that mixer can leave some cases untouched | 2009-09-28 17:14:32 |
| like, one of inputs is good enough, so lets leave it as is | 2009-09-28 17:15:00 |
| but logistic mixer always transforms the inputs | 2009-09-28 17:15:26 |
| and the integer implementation is not very precise | 2009-09-28 17:15:44 |
<toffer> | well logistic mixing itself doesn't. but the optimization technique matt proposed | 2009-09-28 17:17:12 |
| and you don't need to do that | 2009-09-28 17:18:30 |
<Shelwien> | i don't understand you at all ;) | 2009-09-28 17:19:10 |
<toffer> | the update rule in paq is a gradient step | 2009-09-28 17:19:34 |
| that is implementation | 2009-09-28 17:19:42 |
| you can do it differently | 2009-09-28 17:19:49 |
<Shelwien> | i was saying that one explanation for logistic mixing having poor performance in that case | 2009-09-28 17:22:03 |
| is precision/performance issues in my mix2 | 2009-09-28 17:22:22 |
<toffer> | yes but you don't need to "update" if p0 ~ p1 | 2009-09-28 17:24:30 |
<Shelwien> | yeah, and i tried adding that condition, and it helped a little even | 2009-09-28 17:25:09 |
| my point was that logistic mixer always modifies the inputs | 2009-09-28 17:25:44 |
<toffer> | you'd need to do that with the w* approach in the linear domain, too | 2009-09-28 17:26:05 |
| it's just an implementation issue | 2009-09-28 17:26:13 |
<Shelwien> | ...anyway, thats not really interesting | 2009-09-28 17:26:32 |
<toffer> | that's right, definitely :) | 2009-09-28 17:26:50 |
<Shelwien> | i was trying to discuss alternative applications of the same string probability approach | 2009-09-28 17:26:53 |
| for example, CTW is one of them | 2009-09-28 17:27:08 |
| or, for another example | 2009-09-28 17:27:45 |
| we can try replacing the switching in the formula | 2009-09-28 17:27:59 |
| with linear mixing | 2009-09-28 17:28:02 |
<toffer> | well i don't know that much about ctw since i lost interest in it after noticing that it's rather inefficient | 2009-09-28 17:29:37 |
| the other idea might be interesting for suer | 2009-09-28 17:30:10 |
| sure | 2009-09-28 17:30:11 |
<Shelwien> | then there's one more thing | 2009-09-28 17:30:52 |
| afair, in my likelihood interpretation for logistic mixing | 2009-09-28 17:31:10 |
| the bit probability estimation was derived from likelihoods | 2009-09-28 17:31:35 |
| and mixer weight was optimized by only that | 2009-09-28 17:32:24 |
| but maybe we should try optimizing by the whole codelength | 2009-09-28 17:33:34 |
| well, i already did that with BFA though | 2009-09-28 17:33:49 |
| and current mixer implementations (both logistic and linear) seem to have better performance | 2009-09-28 17:34:25 |
<toffer> | these are adaptive | 2009-09-28 17:34:54 |
| bfa is limited to a fixed window | 2009-09-28 17:35:01 |
<Shelwien> | BFA is too, of course | 2009-09-28 17:35:03 |
| not really, though there was one like that too | 2009-09-28 17:35:21 |
<toffer> | afaik bfa searches for an optimal parameter | 2009-09-28 17:35:34 |
| via bruteforcing | 2009-09-28 17:35:36 |
<Shelwien> | most straightforward BFA implementation | 2009-09-28 17:35:48 |
<toffer> | for a *single* situation | 2009-09-28 17:35:55 |
| but | 2009-09-28 17:35:57 |
| both paq and linear mixers updates are based on adaption taking previous results into account | 2009-09-28 17:36:23 |
<Shelwien> | is accumulating (exponentially weighted) codelengths with different parameter values | 2009-09-28 17:36:24 |
<toffer> | that is inefficient | 2009-09-28 17:36:34 |
| since it introduces phase delays | 2009-09-28 17:36:42 |
| which isn't the case for gradient descent | 2009-09-28 17:36:59 |
<Shelwien> | in a way, but that's not always bad | 2009-09-28 17:37:04 |
<toffer> | well, yes | 2009-09-28 17:37:38 |
| but you might imagine what iterative minimization actually does | 2009-09-28 17:38:09 |
| the cost functions slightly changes each step | 2009-09-28 17:38:25 |
| thus you can assume that between a few steps it is constant | 2009-09-28 17:38:39 |
| and the parameter noise gets averaged | 2009-09-28 17:38:56 |
| and i need to get into a supermarket now to buy something to eat | 2009-09-28 17:39:41 |
<Shelwien> | ;) | 2009-09-28 17:39:46 |
<toffer> | i'll be back in 30mins | 2009-09-28 17:39:47 |
<Shelwien> | good luck ;) | 2009-09-28 17:39:53 |
<toffer> | and that dna guy is really annoying to me | 2009-09-28 17:40:36 |
<Shelwien> | yeah, but dna sequence itself is interesting | 2009-09-28 17:41:12 |
| currenly using it for FMA testing ;) | 2009-09-28 17:41:26 |
| (e.coli) | 2009-09-28 17:41:31 |
| also there's stuff like "complimentary polyndromes" | 2009-09-28 17:44:56 |
| about which i'm not sure how to model it with CM | 2009-09-28 17:45:17 |
| these specialized implementations which support it, though | 2009-09-28 17:45:50 |
| are LZ-like and have worse results than ash ;) | 2009-09-28 17:46:23 |
| *palindromes | 2009-09-28 17:53:11 |
| ;) | 2009-09-28 17:53:11 |
*** pinc has joined the channel | 2009-09-28 19:18:48 |
*** pinc has left the channel | 2009-09-28 19:49:28 |
| !quit | 2009-09-28 19:54:57 |