*** complogger has joined the channel		2009-09-28 10:15:18
*** compbooks has joined the channel		2009-09-28 10:15:59
*** toffer has joined the channel		2009-09-28 14:34:16
<toffer>	hi	2009-09-28 14:34:29
*** pinc has left the channel		2009-09-28 15:40:20
<Shelwien>	hi toffer	2009-09-28 16:04:26
<toffer>	i recently found the term w* - w in the cost function for entropy optimal linear mixing, w* = (y-p0)/(p1-p0), p_mix = (p1-p0)*w + p0	2009-09-28 16:15:03
	and i improved my time series predictor	2009-09-28 16:15:32
	my advisor at fraunhofer was surprised that i developed stuff similar to his approach (well the results of data analysis) for that	2009-09-28 16:16:17
<Shelwien>	huh...	2009-09-28 16:23:25
	its exactly how my linear mixer works... %)	2009-09-28 16:23:37
<toffer>	there is a mistake	2009-09-28 16:24:12
	i know	2009-09-28 16:24:14
	it is "not y" in the nominator	2009-09-28 16:24:25
	a single dp_mix/dw is 1/(w*-w)	2009-09-28 16:24:53
	-1/(w*-w)	2009-09-28 16:25:04
	but again that w*-w is in the denominator	2009-09-28 16:25:21
	i just wanted to say that this does show similarities	2009-09-28 16:25:42
	and again	2009-09-28 16:27:31
	i forgot to insert the entropy	2009-09-28 16:27:41
	to be exact	2009-09-28 16:27:44
	y ln p_mix + (1-y) ln 1-p_mix	2009-09-28 16:28:10
	-[ y ln p_mix + (1-y) ln 1-p_mix ]	2009-09-28 16:28:22
	grml	2009-09-28 16:28:24
	you know what i mean	2009-09-28 16:28:29
	^^	2009-09-28 16:28:29
	never made that much typos	2009-09-28 16:28:36
	in such a short time	2009-09-28 16:28:41
<Shelwien>	...	2009-09-28 16:29:10
<toffer>	i just found it interesting	2009-09-28 16:29:23
<Shelwien>	...well, as to me, i was recently thinking about mapping of statistics	2009-09-28 16:30:47
	like, computing p.d for DCT coefficients by p.d of image pixels ;)	2009-09-28 16:31:25
<toffer>	why not	2009-09-28 16:32:30
	that baysian stuff is really good for optimization	2009-09-28 16:32:55
<Shelwien>	well, it has many uses actually	2009-09-28 16:33:02
	i think it should be applicable for your time series as well	2009-09-28 16:33:14
<toffer>	i can hardly change anything now	2009-09-28 16:33:25
	it'll be finished in 2 to 3 weeks	2009-09-28 16:33:32
	afterwards i gonna correct some mistakes	2009-09-28 16:33:40
	and get it printed	2009-09-28 16:33:42
	guess what the optimal solution for an entropy optimal linear mixer is	2009-09-28 16:35:14
	i just calculated it	2009-09-28 16:35:23
	it's an average of these w*	2009-09-28 16:35:42
	w_opt = (sum c_k w*_k) / (sum c_k)	2009-09-28 16:38:35
	that's pretty much like a linear counter	2009-09-28 16:38:46
	but y_k is replaced with w*	2009-09-28 16:39:02
	w*_k	2009-09-28 16:39:09
	in the stationary case k->oo it's like exponential decay of weights	2009-09-28 16:39:56
	w*_k	2009-09-28 16:40:03
<Shelwien>	whatever... it all depends on a practical implementation anyway	2009-09-28 16:40:11
<toffer>	so my approximation and your counter backprop is all the same	2009-09-28 16:40:19
	at least you now know that your scheme is optimal	2009-09-28 16:40:51
	well implementation is step 2 right after knowing what to do	2009-09-28 16:41:39
	did you use P(y=0) or P(y=1)	2009-09-28 16:43:03
<Shelwien>	my models usually work with P(y=0)	2009-09-28 16:45:45
<toffer>	well in the case w* is (y-p0)/(p1-p0) ;)	2009-09-28 16:46:31
	that case	2009-09-28 16:46:36
<Shelwien>	i guess ;)	2009-09-28 16:46:52
	but why linear mixer suddenly? ;)	2009-09-28 16:47:14
<toffer>	since a piecewise linear function can approximate anything	2009-09-28 16:48:23
	as you know for linear mixing i looked up weights in a context of p0,p1	2009-09-28 16:48:39
	and nonlinear weighting can be emulated by making w a (linear) function of nonlinear transformations of p0,p1	2009-09-28 16:49:54
	thus i can mirror the behaviour of stretch/squash	2009-09-28 16:50:10
	while still having the ability to get an optimal solution	2009-09-28 16:50:24
<Shelwien>	%)	2009-09-28 16:50:29
<toffer>	which cannot be calculated for paq style mixing	2009-09-28 16:50:59
<Shelwien>	well, i'd like to somehow extend the the max likelihood approach (from logistic mixing) to some other areas	2009-09-28 16:51:13
<toffer>	at least not in a closed form	2009-09-28 16:51:17
<Shelwien>	but until now couldn't really think of anything	2009-09-28 16:51:28
<toffer>	parameter estimation?	2009-09-28 16:51:38
	why don't you make an ew optimizer	2009-09-28 16:53:24
	new	2009-09-28 16:53:26
	that basian stuff i read about simultaniously estimates: 1. cost function parameters and 2. search process parameters	2009-09-28 16:54:02
	that sounded specific	2009-09-28 16:54:15
	but was pretty much general	2009-09-28 16:54:23
	and very powerful	2009-09-28 16:54:26
<Shelwien>	no, i wasn't talking about optimization	2009-09-28 16:58:46
	if you remember, there was that explanation of logistic mixing via string probabilities	2009-09-28 16:59:24
<toffer>	for me it's still more an interpretation than explaination	2009-09-28 16:59:57
<Shelwien>	so logistic mixer appeared to be an approximation of static switching of static estimations	2009-09-28 16:59:58
	Matt clearly explained after that that it was derived purely ad hoc	2009-09-28 17:00:55
	and my interpretation or whatever at least provides a "physical" background for it	2009-09-28 17:01:57
	anyway, it looks fairly solid for me - with verification by alternative update formula and all	2009-09-28 17:03:02
<toffer>	the intention of optimization is enough for me	2009-09-28 17:03:10
	and the characteristic of the contained transform	2009-09-28 17:03:19
<Shelwien>	so i'd like to apply a similar approach somewhere else	2009-09-28 17:03:44
	for example, maybe we can somehow build a logistic mixer for p=~0.5	2009-09-28 17:04:41
	which would be more efficient than linear one ;)	2009-09-28 17:04:53
<toffer>	near 0.5 logistic mixing behaves almost like linear mixing	2009-09-28 17:05:38
<Shelwien>	yeah, but its worse	2009-09-28 17:05:58
<toffer>	not really in that case	2009-09-28 17:06:17
	but the logistic behaviour is very different near 0/1	2009-09-28 17:06:29
<Shelwien>	maybe only a little, but taking into account that linear mixing is also much simpler...	2009-09-28 17:06:30
	as i mentioned, i tried it	2009-09-28 17:06:46
<toffer>	especially for weighs far from 0/1	2009-09-28 17:06:46
<Shelwien>	was expecting a lot after mix_test experiments	2009-09-28 17:07:07
	and tried to apply that logistic mix2 in my audio codec entropy model	2009-09-28 17:07:39
	and it was significantly worse in the same circumstances (with only mixer parameters optimized)	2009-09-28 17:08:33
<toffer>	logistic mixing just works well iff predictions near 0/1 provide a high confidence	2009-09-28 17:08:42
<Shelwien>	well, as i said, my case is more about p=~0.5	2009-09-28 17:09:09
	like that zero flag model	2009-09-28 17:09:20
	but i wonder whether that "explanation" can be adjusted somehow	2009-09-28 17:10:35
	like, maybe single ~0.5 prediction can be split into a mix of 0/1 and 0.5	2009-09-28 17:10:59
<toffer>	the actual question is why the overall result is worse: since logistic mixing doesn't behave "linear" enough for 0.5 or since it behaves "logistic" for other values	2009-09-28 17:11:20
<Shelwien>	actually it could be explained by another feature of my linear mixer	2009-09-28 17:12:29
	it only updates the weight when distance between inputs is large enough	2009-09-28 17:13:01
<toffer>	the essence is the same compared to my previous statement.	2009-09-28 17:14:00
	since it depends on the model output	2009-09-28 17:14:13
<Shelwien>	so that mixer can leave some cases untouched	2009-09-28 17:14:32
	like, one of inputs is good enough, so lets leave it as is	2009-09-28 17:15:00
	but logistic mixer always transforms the inputs	2009-09-28 17:15:26
	and the integer implementation is not very precise	2009-09-28 17:15:44
<toffer>	well logistic mixing itself doesn't. but the optimization technique matt proposed	2009-09-28 17:17:12
	and you don't need to do that	2009-09-28 17:18:30
<Shelwien>	i don't understand you at all ;)	2009-09-28 17:19:10
<toffer>	the update rule in paq is a gradient step	2009-09-28 17:19:34
	that is implementation	2009-09-28 17:19:42
	you can do it differently	2009-09-28 17:19:49
<Shelwien>	i was saying that one explanation for logistic mixing having poor performance in that case	2009-09-28 17:22:03
	is precision/performance issues in my mix2	2009-09-28 17:22:22
<toffer>	yes but you don't need to "update" if p0 ~ p1	2009-09-28 17:24:30
<Shelwien>	yeah, and i tried adding that condition, and it helped a little even	2009-09-28 17:25:09
	my point was that logistic mixer always modifies the inputs	2009-09-28 17:25:44
<toffer>	you'd need to do that with the w* approach in the linear domain, too	2009-09-28 17:26:05
	it's just an implementation issue	2009-09-28 17:26:13
<Shelwien>	...anyway, thats not really interesting	2009-09-28 17:26:32
<toffer>	that's right, definitely :)	2009-09-28 17:26:50
<Shelwien>	i was trying to discuss alternative applications of the same string probability approach	2009-09-28 17:26:53
	for example, CTW is one of them	2009-09-28 17:27:08
	or, for another example	2009-09-28 17:27:45
	we can try replacing the switching in the formula	2009-09-28 17:27:59
	with linear mixing	2009-09-28 17:28:02
<toffer>	well i don't know that much about ctw since i lost interest in it after noticing that it's rather inefficient	2009-09-28 17:29:37
	the other idea might be interesting for suer	2009-09-28 17:30:10
	sure	2009-09-28 17:30:11
<Shelwien>	then there's one more thing	2009-09-28 17:30:52
	afair, in my likelihood interpretation for logistic mixing	2009-09-28 17:31:10
	the bit probability estimation was derived from likelihoods	2009-09-28 17:31:35
	and mixer weight was optimized by only that	2009-09-28 17:32:24
	but maybe we should try optimizing by the whole codelength	2009-09-28 17:33:34
	well, i already did that with BFA though	2009-09-28 17:33:49
	and current mixer implementations (both logistic and linear) seem to have better performance	2009-09-28 17:34:25
<toffer>	these are adaptive	2009-09-28 17:34:54
	bfa is limited to a fixed window	2009-09-28 17:35:01
<Shelwien>	BFA is too, of course	2009-09-28 17:35:03
	not really, though there was one like that too	2009-09-28 17:35:21
<toffer>	afaik bfa searches for an optimal parameter	2009-09-28 17:35:34
	via bruteforcing	2009-09-28 17:35:36
<Shelwien>	most straightforward BFA implementation	2009-09-28 17:35:48
<toffer>	for a single situation	2009-09-28 17:35:55
	but	2009-09-28 17:35:57
	both paq and linear mixers updates are based on adaption taking previous results into account	2009-09-28 17:36:23
<Shelwien>	is accumulating (exponentially weighted) codelengths with different parameter values	2009-09-28 17:36:24
<toffer>	that is inefficient	2009-09-28 17:36:34
	since it introduces phase delays	2009-09-28 17:36:42
	which isn't the case for gradient descent	2009-09-28 17:36:59
<Shelwien>	in a way, but that's not always bad	2009-09-28 17:37:04
<toffer>	well, yes	2009-09-28 17:37:38
	but you might imagine what iterative minimization actually does	2009-09-28 17:38:09
	the cost functions slightly changes each step	2009-09-28 17:38:25
	thus you can assume that between a few steps it is constant	2009-09-28 17:38:39
	and the parameter noise gets averaged	2009-09-28 17:38:56
	and i need to get into a supermarket now to buy something to eat	2009-09-28 17:39:41
<Shelwien>	;)	2009-09-28 17:39:46
<toffer>	i'll be back in 30mins	2009-09-28 17:39:47
<Shelwien>	good luck ;)	2009-09-28 17:39:53
<toffer>	and that dna guy is really annoying to me	2009-09-28 17:40:36
<Shelwien>	yeah, but dna sequence itself is interesting	2009-09-28 17:41:12
	currenly using it for FMA testing ;)	2009-09-28 17:41:26
	(e.coli)	2009-09-28 17:41:31
	also there's stuff like "complimentary polyndromes"	2009-09-28 17:44:56
	about which i'm not sure how to model it with CM	2009-09-28 17:45:17
	these specialized implementations which support it, though	2009-09-28 17:45:50
	are LZ-like and have worse results than ash ;)	2009-09-28 17:46:23
	*palindromes	2009-09-28 17:53:11
	;)	2009-09-28 17:53:11
*** pinc has joined the channel		2009-09-28 19:18:48
*** pinc has left the channel		2009-09-28 19:49:28
	!quit	2009-09-28 19:54:57