*** maniscalco_ has joined the channel		2019-05-07 21:32:03
*** Jibz has left the channel		2019-05-07 22:13:24
*** maniscalco__ has joined the channel		2019-05-08 00:57:34
*** maniscalco_ has left the channel		2019-05-08 00:59:37
*** maniscalco__ has left the channel		2019-05-08 01:01:53
*** maniscalco_ has joined the channel		2019-05-08 01:54:51
<unic0rn>	you've mentioned "intel" and "smart" in one sentence and channel died	2019-05-08 04:51:38
	;)	2019-05-08 04:51:39
<Shelwien>	well, did you see my benchmark stats for gcc9?	2019-05-08 06:38:11
	for example, here's how gcc and icc PGO works:	2019-05-08 07:08:06
	3.484s 3.047s: CMo8-gcc91-x64-SSE4	2019-05-08 07:08:09
	3.422s 3.094s: CMo8-gcc91-x64-SSE4-PGO	2019-05-08 07:08:09
	3.437s 3.062s: CMo8-ic19-x64-SSE4	2019-05-08 07:08:09
	3.188s 2.781s: CMo8-ic19-x64-SSE4-PGO	2019-05-08 07:08:09
<unic0rn>	nope	2019-05-08 07:36:16
	not saying it's not impressive, it's just that when i can avoid C/C++, i will	2019-05-08 07:36:58
<Shelwien>	i can't say i especially like it either	2019-05-08 07:38:54
<unic0rn>	for everything else i would rather choose llvm because of the many targets	2019-05-08 07:38:55
<Shelwien>	i've been programming is asm for a very long time	2019-05-08 07:39:37
<unic0rn>	well, C/C++ performs well, and it gives the programmers the tools their need. not surprising, considering the amount of C/C++ code around	2019-05-08 07:39:44
	but that still doesn't make the language good per se	2019-05-08 07:39:56
<Shelwien>	made my own DPMI framework for DOS	2019-05-08 07:40:12
<unic0rn>	ah, dpmi. good old times.	2019-05-08 07:40:27
<Shelwien>	and basically a custom asm language via tasm macros	2019-05-08 07:40:28
<unic0rn>	"I CAN HAS 32BIT FLAT MODEL NOW?"	2019-05-08 07:40:35
<Shelwien>	DPMI wasn't flat, you could use segment registers there	2019-05-08 07:40:59
	thing is, intel actually provided a perfect solution for memory allocation in hardware	2019-05-08 07:41:23
<unic0rn>	yeah, but iirc, it was used with flat model usually	2019-05-08 07:41:29
	also, i recall switching flat model alone was dead simple	2019-05-08 07:41:40
<Shelwien>	via virtual memory and segments	2019-05-08 07:41:41
	with selector:offset pointers	2019-05-08 07:42:04
	you can always transparently reallocate the memory block it references	2019-05-08 07:42:18
	also virtual memory allows to only copy one page, the full pages can be simply remapped to different addr	2019-05-08 07:42:52
	and selector can be transparently reassigned to different base addr which fits the reallocated memory block	2019-05-08 07:43:20
	so there was a perfect solution to dynamic memory usage	2019-05-08 07:43:41
<unic0rn>	back then, i was mostly toying with amiga 600 and m68k	2019-05-08 07:43:45
	and all the amiga hardware. had a copy of a perfect book, describing every single bit in it	2019-05-08 07:44:09
	so i've made custom video mode that used all the available dma cycles for copper for dynamic palette changes for photo display, so without additional - fast - memory, cpu was able to read new instructions only during vblank	2019-05-08 07:44:55
	fun times	2019-05-08 07:45:10
<Shelwien>	yeah	2019-05-08 07:45:37
	i also made a video mode editor for PC	2019-05-08 07:45:44
<unic0rn>	never really published it though. it was just personal toy, amiga was half-dead by then anyway	2019-05-08 07:46:00
	or actually mostly dead	2019-05-08 07:46:13
	not sure if x-mode was more flexible than what amiga did. possibly yes, in frequency terms.	2019-05-08 07:47:40
	it still had a few nice tricks though. using overscan area on tv, and switching between 50 and 60hz iirc	2019-05-08 07:48:05
	interlace modes in 60hz were actually acceptable	2019-05-08 07:48:23
	that reminds me of working with video toaster on pc later on	2019-05-08 07:49:27
<Shelwien>	the text mode on PC was actually an equivalent of 720x400 16-color graphics	2019-05-08 07:49:36
	which wasn't available normally	2019-05-08 07:50:01
	so it wasn't possible to make a natural-looking textmode screenshot	2019-05-08 07:50:21
<unic0rn>	text modes..	2019-05-08 07:50:52
<Shelwien>	it was also a good thing	2019-05-08 07:51:20
<unic0rn>	i spent a lot of time with linux and freebsd, but as soon as it became possible, i've started using framebuffers for text consoles	2019-05-08 07:51:31
<Shelwien>	i ended up having to simulate it when writing this: https://github.com/Shelwien/cmp	2019-05-08 07:51:41
	and apparently fast character generation is not an easy task	2019-05-08 07:52:04
<unic0rn>	it is not. especially today, with all the fancy, vector, antialiased fonts	2019-05-08 07:52:53
<Shelwien>	first, i had to dump system font to bitmap, since using winapi directly was too slow	2019-05-08 07:53:02
	and then there was a problem with coloring it	2019-05-08 07:53:22
<unic0rn>	it's writing a custom terminal emulator, basically, or rather, just the display part of it	2019-05-08 07:53:54
	btw, are you familiar with zx spectrum video memory?	2019-05-08 07:54:11
<Shelwien>	that is, simply expanding a character from bitmap to specified fore- and background colors	2019-05-08 07:54:27
	was also visibly slow	2019-05-08 07:54:36
	i ended up having to cache already colored symbols	2019-05-08 07:55:04
	as to zx, yes, it was annoying	2019-05-08 07:55:39
<unic0rn>	lets not forget the extended 512x192 timex mode. actually, mine had additional one, 256x192 but with 8x1 pixel attributes	2019-05-08 07:56:21
	never saw a single piece of software using that one	2019-05-08 07:56:31
	but mine was damaged anyway, it was displaying only half of the colors, so i didn't bother experimenting that much. if not for the fact i was just a kid, i could perhaps fix it, assuming it was a single bit not going through from the memory to ULA	2019-05-08 07:58:19
<Shelwien>	btw, digitized music also worked in similar ways at that time	2019-05-08 07:58:33
	it was possible to record a song and then play it	2019-05-08 07:59:05
<unic0rn>	the audio clock on amiga... that was a mess	2019-05-08 07:59:20
<Shelwien>	but it also requred full cpu capacity	2019-05-08 07:59:32
	because 22khz *8 for 1bit beeper...	2019-05-08 07:59:45
<unic0rn>	yeah, on speccy	2019-05-08 08:00:07
<Shelwien>	and also data compression was pretty important	2019-05-08 08:00:08
<unic0rn>	not sure if people didn't try sample playback via white noise or something, on AY	2019-05-08 08:00:23
<Shelwien>	so we had a few good-quality songs	2019-05-08 08:00:33
<unic0rn>	i know i've experimented with 14bit (or was it 12, i don't remember) audio on amiga	2019-05-08 08:00:46
<Shelwien>	as their own programs, which couldn't do anything else	2019-05-08 08:00:47
<unic0rn>	because the 8bit resolution was subject to 6bit i think volume adjustment	2019-05-08 08:01:15
	in hardware	2019-05-08 08:01:22
	so you could have 2 channels with higher resolution instead of 2 with lower	2019-05-08 08:01:41
	4 with lower*	2019-05-08 08:01:47
	that was used a lot actually i think	2019-05-08 08:02:10
*** Jibz has joined the channel		2019-05-08 08:02:11
<Shelwien>	yeah	2019-05-08 08:02:34
<unic0rn>	as for music, well. the king of the hill was the .mod format for a time	2019-05-08 08:02:37
<Shelwien>	true, but that was less interesting	2019-05-08 08:03:05
<unic0rn>	yeah. everything breaking the limits was interesting back then	2019-05-08 08:03:30
	after that, came svga and sound blaster	2019-05-08 08:03:40
<Shelwien>	since you can't just make one for a given song without musical background	2019-05-08 08:03:43
<unic0rn>	and all that was left was MOAR CYCLES	2019-05-08 08:03:49
<Shelwien>	while even on 6502 it was possible to digitize a normal song via LPT port	2019-05-08 08:04:04
<unic0rn>	well, and voodoo graphics	2019-05-08 08:04:05
<Shelwien>	yeah, and it all became boring	2019-05-08 08:04:23
	so i mostly switched to compression	2019-05-08 08:04:31
<unic0rn>	well, there's raytracing now. and voxels. and GPUs finally capable of rendering, or even raytracing, voxels, even without all that RTX stuff from nvidia	2019-05-08 08:05:28
	on 3D graphics front, it's interesting, just way way more complicated than it was before	2019-05-08 08:05:54
<Shelwien>	its only interesting to watch	2019-05-08 08:06:31
<unic0rn>	similar with gamedev in general. it may be sad to not be able to push the limits of entirely custom hardware like before, but now that even consoles use standard solutions, one can focus on the software side of the problem, instead of working out the quirks of the hardware	2019-05-08 08:07:21
<Shelwien>	i'm not an artist, so i can't provide nice models for modern 3D	2019-05-08 08:07:24
	and even properly using existing APIs and hardware is already pretty hard	2019-05-08 08:08:14
<unic0rn>	3D modelling isn't hard, as long as you've got some drawings/photos as a guide	2019-05-08 08:08:18
<Shelwien>	i know, i can do it	2019-05-08 08:08:38
<unic0rn>	well, the APIs are a mess at times	2019-05-08 08:08:46
<Shelwien>	its just not interesting for me, i'm a programmer	2019-05-08 08:08:47
	but there's not much interesting work for a programmer in modern 3D	2019-05-08 08:10:09
<unic0rn>	the biggest mess for now is i guess the fact that windows API is object oriented. with C++ compilers, it plays nicely... more or less.	2019-05-08 08:10:12
<Shelwien>	well, runtime 3D	2019-05-08 08:10:18
	its just struggling to find workarounds to make weird APIs do what you want	2019-05-08 08:10:48
<unic0rn>	runtime? considering how raytraced voxels haven't made their way into the gaming world properly just yet, and at the same time, are already possible on current hardware, i would say there's a lot of interesting work to be done	2019-05-08 08:11:02
	a lot of it on the gpu side of things	2019-05-08 08:11:25
<Shelwien>	well, its too complicated	2019-05-08 08:11:44
	a very limited number of people can actually contribute	2019-05-08 08:11:58
<unic0rn>	it's not really rocket science, but it takes some learning	2019-05-08 08:12:11
<Shelwien>	because they have access to all required documentation and tools	2019-05-08 08:12:15
	and know how to use them	2019-05-08 08:12:30
<unic0rn>	there are examples of raytracing voxels on gpus, with source i guess, or at least very detailed explanation	2019-05-08 08:12:32
<Shelwien>	i mean	2019-05-08 08:12:51
	what's the point of just making a demo showing voxels?	2019-05-08 08:13:09
	there was a "mars" demo on DOS which did that	2019-05-08 08:13:22
<unic0rn>	it isn't really using any hard tricks on the gpu side afaik. now, i didn't experiment with it myself, but i've read a few articles on blogs and saw a few things	2019-05-08 08:13:25
<Shelwien>	without GPU even	2019-05-08 08:13:29
<unic0rn>	well, there was 64k intro with software raytracing in the DOS era	2019-05-08 08:13:46
	and no, demo makes little sense	2019-05-08 08:13:56
	but adding raytraced voxels to some existing engine, like godot, that's something else	2019-05-08 08:14:19
	but a lot of interesting stuff is going on on the OS/toolchain front right now, that's also true. flutter for android/iOS/web and desktop possibly/and in the end, for fuchsia	2019-05-08 08:15:48
	the whole fuchsia OS	2019-05-08 08:15:51
	i wonder where google is going with it	2019-05-08 08:16:16
	hopefully to replace both android and chromeos	2019-05-08 08:16:29
	and well, the whole "lets invent new language" thing, so popular in recent years. there's kotlin for java diehard fans, there's rust, winning over a lot of C/C++ people	2019-05-08 08:18:09
	and it's never been easier to create something new in that regard, thanks to llvm	2019-05-08 08:18:28
<Shelwien>	rust uses llvm to compile	2019-05-08 08:18:36
<unic0rn>	yeah, i know.	2019-05-08 08:18:43
<Shelwien>	so it can't yet compete with C/C++ at performance	2019-05-08 08:18:49
<unic0rn>	depending on the code, most likely. programmers good at C/C++ are good enough to throw an intrinsic or two at a performance critical function	2019-05-08 08:19:33
	sure, same less-optimized-by-hand code, will be slower	2019-05-08 08:20:00
<Shelwien>	well, no point	2019-05-08 08:20:15
<unic0rn>	but i guess there are other features of the language that make up for that	2019-05-08 08:20:19
<Shelwien>	rust only provides more work for programmer, to work around its strictness	2019-05-08 08:20:57
<unic0rn>	sometimes it's strictness that people are after	2019-05-08 08:21:19
	one-man's-army's tool may be team's horror show	2019-05-08 08:21:38
<Shelwien>	in theory, i understand that side of language development	2019-05-08 08:21:51
<unic0rn>	i prefer to work with my code alone	2019-05-08 08:21:59
<Shelwien>	yeah	2019-05-08 08:22:12
<unic0rn>	but i can understand why some people prefer the language taking care of some things and limiting the resulting clusterf..	2019-05-08 08:22:18
<Shelwien>	i don't like when i have to convince the compiler to do what i want	2019-05-08 08:22:33
<unic0rn>	it's why i like forth so much	2019-05-08 08:22:33
	it puts no limits	2019-05-08 08:22:40
<Shelwien>	rather than just doing what i asked	2019-05-08 08:22:53
<unic0rn>	C compiler will track a lot of things and complain, usually rightfully so, but not always, sometimes you know what you're doing, but the compiler doesn't and you need to tell it explicitly	2019-05-08 08:23:36
	forth has no such overhead. either you know what you're doing, or it'll crash	2019-05-08 08:23:59
<Shelwien>	well, one example of what i'm talking about	2019-05-08 08:24:15
	C++ has templates	2019-05-08 08:24:23
	they're kinda like macros	2019-05-08 08:24:37
<unic0rn>	you really do love your... oh wait. we've been here before.	2019-05-08 08:25:03
<Shelwien>	you can substitute a constant or a type into a piece of code	2019-05-08 08:25:03
	well, its very very useful, i'd use the same approach even without templates	2019-05-08 08:25:46
	its possible to simulate with macros and #include	2019-05-08 08:26:00
<unic0rn>	well, lispers love lisp for its macros for a reason	2019-05-08 08:26:05
	except i guess macros in lisp can do a lot more	2019-05-08 08:26:20
<Shelwien>	my point is different here	2019-05-08 08:26:20
	so, there's this useful language feature	2019-05-08 08:26:29
	and with VS/IntelC/clang I can just use it	2019-05-08 08:26:50
	that is, usually I'd have a normal function first	2019-05-08 08:27:22
	then I'd need multiple instances of it, for example a version for decoding and another for encoding	2019-05-08 08:28:01
	so normally I can just add a template< int f_DEC > line before the function declaration	2019-05-08 08:28:34
	and that's it, f_DEC would behave like a macro constant	2019-05-08 08:28:48
	and compiler would generate two instances of function code	2019-05-08 08:29:06
	each separately optimized	2019-05-08 08:29:28
	like, branches only necessary for other modes would disappear etc	2019-05-08 08:29:47
	and as I said, it works with 3 difference C++ compilers	2019-05-08 08:30:10
	and then there's gcc	2019-05-08 08:30:18
	where developers decided that somehow at this specific point they want hard standard compliance	2019-05-08 08:30:50
	so, when a template has a class parameter	2019-05-08 08:31:25
	gcc doesn't want to see the contents of that class	2019-05-08 08:32:00
	so I have to add using directives for each "imported" name	2019-05-08 08:32:32
	just for gcc	2019-05-08 08:32:35
	like this: https://github.com/Shelwien/stegdict/blob/master/model.inc#L6	2019-05-08 08:32:53
	that's the compiler strictness i mean	2019-05-08 08:33:23
	and the actual reason for it	2019-05-08 08:33:40
	is that C++ standard group	2019-05-08 08:33:55
	wanted to implement modular complication for templates, like usual .c/.h approach	2019-05-08 08:34:36
	in which case, yeah, the compiler won't see the contents of parameter class, because its defined in other module	2019-05-08 08:35:28
	but its only what they wanted	2019-05-08 08:35:49
	they actually failed to do it, there's no C++ compiler which allows to split template declarations and implementations	2019-05-08 08:36:34
	templates are just fully defined in header files usually, along with all the code	2019-05-08 08:37:10
	and still, gcc forced me to write extra declarations, just for it	2019-05-08 08:37:33
<unic0rn>	do they implement it in all standards, or just gnu?	2019-05-08 08:38:30
	because if in all, that's just stupid	2019-05-08 08:38:43
	i mean, specs are one thing, but the reality is another	2019-05-08 08:38:58
<Shelwien>	its specified in the main C++ standard	2019-05-08 08:39:21
<unic0rn>	yeah, i know	2019-05-08 08:39:28
<Shelwien>	so gcc follows it in all modes	2019-05-08 08:39:29
<unic0rn>	that's silly. if no other compiler implements it, that's silly	2019-05-08 08:40:07
<Shelwien>	comparing to that, MS/VS approach to C++ is more user-friendly	2019-05-08 08:40:20
	its not just this	2019-05-08 08:40:33
	there're lots of features, where MS syntax extensions are something that is useful for a programmer	2019-05-08 08:41:01
<unic0rn>	the whole GNU-everything is rarely user friendly tbh. it tries to be, but the priority is always being friendly to the ideology and doing the right thing when it comes to the specs i guess	2019-05-08 08:41:23
<Shelwien>	like access operators for class fields, or various pragmas	2019-05-08 08:41:46
	while gcc design choices are usually made based on some abstract concepts	2019-05-08 08:42:20
	like, "the theory for this is more interesting"	2019-05-08 08:42:50
	and then i have to look for workarounds	2019-05-08 08:43:01
<unic0rn>	well, extensions in general are another thing. imho it's the language that should define everything that is needed. not even saying about compiler pragmas, because that's another thing, but stuff like intrinsics. "will it work on this compiler? does it differ? if so, how?" why the hell can't be have such things standardized properly	2019-05-08 08:44:08
<Shelwien>	intrisics are another fun thing	2019-05-08 08:44:49
	they're barely (and only partly) documented	2019-05-08 08:45:03
	so hard to expect much :)	2019-05-08 08:45:13
<unic0rn>	they shouldn't match the platform in the first place	2019-05-08 08:45:21
	compiler should take care of mapping that	2019-05-08 08:45:30
<Shelwien>	the most interesting example for me	2019-05-08 08:45:35
<unic0rn>	you should just work with vectors, without giving a damn if it'll be compiled for x86 or arm	2019-05-08 08:45:53
<Shelwien>	was when i found - while reading a fixed bug list for a new version	2019-05-08 08:46:06
	that IntelC apparently supports gnu inline asm	2019-05-08 08:46:18
	i mean, it also supports MS inline asm, and that's actually documented	2019-05-08 08:46:51
	while gnu syntax support is undocumented and limited	2019-05-08 08:47:12
	for example, it only supports AT&T asm syntax (in intel compiler!)	2019-05-08 08:47:28
	and doesn't support named labels	2019-05-08 08:47:36
<unic0rn>	someone had to figure out it'll be useful for some customers.	2019-05-08 08:47:37
<Shelwien>	well, it is very useful	2019-05-08 08:47:47
<unic0rn>	hint word: customers.	2019-05-08 08:47:51
	gcc doesn't have such concept.	2019-05-08 08:47:58
<Shelwien>	MS-style asm can't be used these days	2019-05-08 08:47:59
	because even if you'd write a highly optimized function in asm	2019-05-08 08:48:37
	(doesn't matter if its inline asm or separately compiled)	2019-05-08 08:48:54
	when using that function, compiler automatically has to assume	2019-05-08 08:49:22
	that it would mess up all the registers	2019-05-08 08:49:32
	and memory references by all pointers	2019-05-08 08:49:53
	so very frequently, translating parts of code to asm doesn't help	2019-05-08 08:50:38
	you'd either have to rewrite the whole internal loop in asm	2019-05-08 08:50:54
<unic0rn>	that's the problem with C/C++ compilers.	2019-05-08 08:50:58
	they have to assume a whole lot	2019-05-08 08:51:05
<Shelwien>	well, gnu inline asm actually provides a solution to that	2019-05-08 08:51:12
<unic0rn>	throw a wrench inside and they've got a problem	2019-05-08 08:51:20
<Shelwien>	it has an explicit syntax for what asm code's inputs and outputs	2019-05-08 08:51:41
<unic0rn>	marking registers?	2019-05-08 08:51:43
	yeah	2019-05-08 08:51:46
<Shelwien>	and what other stuff it modifies	2019-05-08 08:52:03
	so this kind of inline asm does improve performance	2019-05-08 08:52:22
	unlike externally defined or MS-inline	2019-05-08 08:52:35
	and it exists in IntelC	2019-05-08 08:52:52
	while still being undocumented :)	2019-05-08 08:53:11
<unic0rn>	i guess i'll think about it more if i'll be writing my own forth-to-x86-64 compiler. llvm target is the primary idea, so that may take some time	2019-05-08 08:53:18
<Shelwien>	https://encode.ru/threads/418-Inline-assembly-routines-for-paq8?p=8559&viewfull=1#post8559	2019-05-08 08:53:54
<unic0rn>	reva forth i'm using, isn't doing any optimizations. it's brain dead in that regard. but it's simple and compact.	2019-05-08 08:53:56
	if you'll define any word, it'll put a call to it, unless it's marked as inline	2019-05-08 08:54:21
<Shelwien>	well, 40k is not that compact	2019-05-08 08:54:23
<unic0rn>	many builtin words are inline, so there's that	2019-05-08 08:54:39
	but still, stuff like 2 << gets compiled into "push 2 on the stack" - that's 3 asm instructions	2019-05-08 08:55:03
<Shelwien>	with C++, you can expect a 3-4k exe file for a simple cmdline utility	2019-05-08 08:55:19
<unic0rn>	then <<, which loads the parameter from the stack	2019-05-08 08:55:26
	and does shl	2019-05-08 08:55:33
	i've just replaced all that crap with 3byte inline definition in asm, shl eax, 2	2019-05-08 08:55:55
	because top of the stack is always in eax	2019-05-08 08:56:03
<Shelwien>	:)	2019-05-08 08:56:11
<unic0rn>	but that brings me to possible optimizations, in forth it could be very simple, but then doing inline stuff could become problematic as well, if not downright impossible without disabling those optimizations locally	2019-05-08 08:56:59
	marking input and output counts for forth words (usually done with comments anyway)	2019-05-08 08:57:27
	and especially, marking the location of inputs in core words, like <<	2019-05-08 08:57:56
	if it would be treated like a macro, shl eax, %input1	2019-05-08 08:58:15
<Shelwien>	well, that's why i'm not that excited about forth now	2019-05-08 08:58:26
<unic0rn>	then it would be trivial for the compiler to optimize it	2019-05-08 08:58:31
<Shelwien>	though i did use it at some point	2019-05-08 08:58:37
	its good to know that this kind of approach exist	2019-05-08 08:59:04
<unic0rn>	well, it's not really that different from C/C++ world in that regard. you gotta go through some hops to not mess with optimizations done by the compiler with your own assembly code	2019-05-08 08:59:11
	unless the compiler will try to parse it all to understand it	2019-05-08 08:59:21
<Shelwien>	yes, but forth is also fully stack-based	2019-05-08 08:59:29
	and while simple arithmetics are usually ok	2019-05-08 08:59:42
<unic0rn>	that's why i'm willing to target llvm. will see what it'll generate at the assembly level	2019-05-08 08:59:59
<Shelwien>	there're always some operators which do complicated things with stack	2019-05-08 09:00:29
<unic0rn>	but it should be able to use much more registers than a regular forth compiler. of course, one can write such optimizations by hand, but what's the point if llvm does that already	2019-05-08 09:00:39
<Shelwien>	yes	2019-05-08 09:00:54
	like i'm saying, normal math can be translated normally	2019-05-08 09:01:11
	but branches, loops and such	2019-05-08 09:01:23
	are usually implemented in forth too	2019-05-08 09:01:50
	they don't use some non-forth logic	2019-05-08 09:01:58
<unic0rn>	yeah, but the 90% of stuff that ends up being redundant, is just there because of the use of the stack. if llvm can figure it out and throw away whole bunch of code, the end result should be fast	2019-05-08 09:02:01
<Shelwien>	well, i don't think so	2019-05-08 09:02:24
	any kind of operation that does indirect access to stack	2019-05-08 09:02:42
<unic0rn>	branches are simple	2019-05-08 09:02:42
<Shelwien>	like access to n'th word indexed by a variable	2019-05-08 09:03:07
	and llvm won't save you	2019-05-08 09:03:15
<unic0rn>	if it's indexed by top of the stack, that should be single asm instruction	2019-05-08 09:03:35
<Shelwien>	sure	2019-05-08 09:03:55
	what i mean, it won't be merged with anything	2019-05-08 09:04:07
<unic0rn>	in case of reva, mov eax, [esi+eax*4]	2019-05-08 09:04:09
<Shelwien>	so while C for loop can be vectorized etc	2019-05-08 09:04:25
	you won't have that with forth	2019-05-08 09:04:35
<unic0rn>	that's also not entirely true i think	2019-05-08 09:04:57
<Shelwien>	i'm not talking about just translating it to C/asm/llvm and making that work	2019-05-08 09:05:01
<unic0rn>	i was thinking about that some time ago	2019-05-08 09:05:07
<Shelwien>	but stack operations can be hard to optimize	2019-05-08 09:05:14
<unic0rn>	if each word has defined inputs and outputs, that is, stack balance	2019-05-08 09:05:26
<Shelwien>	its kinda simular to refactoring of recursive functions	2019-05-08 09:05:30
<unic0rn>	then the compiler can keep track of it	2019-05-08 09:05:37
	yeah, recursion is a bad case for that	2019-05-08 09:05:52
	but regular loops should keep their stack balanced	2019-05-08 09:06:05
<Shelwien>	well, whole forth is like recursion in a way	2019-05-08 09:06:11
<unic0rn>	if they don't, they're badly written	2019-05-08 09:06:17
<Shelwien>	because functions in C can't leave stack unbalanced	2019-05-08 09:06:30
<unic0rn>	well, it's a different stack.	2019-05-08 09:06:50
	there's separate return stack after all, and i wouldn't want to mess the balance of that one	2019-05-08 09:07:12
<Shelwien>	sure, i mean, you can't map forth words to C functions	2019-05-08 09:07:30
<unic0rn>	not easily, no	2019-05-08 09:07:57
	well, unless a word returns just a single value, then it's very simple	2019-05-08 09:08:37
	funny thing, that's the case with my current compression code	2019-05-08 09:09:18
	if a word returns anything at all, it's a single value, never more iirc	2019-05-08 09:09:29
	but when it comes to constants and global variables, there are over 20 of those altogether	2019-05-08 09:10:39
<Shelwien>	https://github.com/riywo/llforth :)	2019-05-08 09:10:46
<unic0rn>	i saw that one	2019-05-08 09:11:06
	didn't test it, looks very basic	2019-05-08 09:11:11
	i'm looking to write something more serious	2019-05-08 09:11:26
<Shelwien>	maybe you can test it with your compressor?	2019-05-08 09:11:31
	if it improves speed, that'd be good?	2019-05-08 09:11:38
<unic0rn>	mostly compliant with ANS forth and capable of doing library calls, so for example basic opengl app should be a no brainer	2019-05-08 09:11:53
	yeah, that's the idea, when i'm done with coding the compression :P	2019-05-08 09:12:05
	i also plan to write it in pretty modular way	2019-05-08 09:12:51
	so when the llvm target won't work the way i wanted, i'll just add x86-64 target	2019-05-08 09:13:19
	of course it'll be missing the llvm optimizations on the beginning	2019-05-08 09:13:41
	but it won't mean the whole code will go out of the window	2019-05-08 09:13:57
<Shelwien>	hm, the book that it mentions: http://download.library1.org/main/2065000/a76c11fd8609daa1fe299009a8e83a55/Igor%20Zhirkov%20%28auth.%29%20-%20Low-Level%20Programming_%20C%2C%20Assembly%2C%20and%20Program%20Execution%20on%20Intel%C2%AE%2064%20Architecture-Apress%20%282017%29.pdf	2019-05-08 09:14:15
<unic0rn>	nice one	2019-05-08 09:15:26
	as for llvm, i'll have to research it a bit for that implementation to be optimal	2019-05-08 09:16:48
	i mean, i could do two modes, compile core words + repl into llvm IR and as a result, to executable, load everything else on the fly and interpret, and second mode - release - noninteractive, compile everything to llvm IR and into executable	2019-05-08 09:17:37
	but that isn't optimal	2019-05-08 09:17:42
<Shelwien>	yeah, need a real compiler	2019-05-08 09:18:13
<unic0rn>	it should be able to compile definitions on the fly, so it would have to use llvm library for that	2019-05-08 09:18:13
	and that's a potential problem actually, depending on what can be done with llvm exactly. most likely everything, but then such interactive mode will go deeper into lowlevel stuff than actuall compilation in release mode	2019-05-08 09:19:16
<Shelwien>	interpreter is only necessary for reflection and dynamic eval	2019-05-08 09:19:25
<unic0rn>	because in release mode, you just throw IR at llvm, "do your thing"	2019-05-08 09:19:26
<Shelwien>	but where would you use that normally?	2019-05-08 09:19:37
	well, rather than llvm, I'd just generate C code on output	2019-05-08 09:20:04
	that would make it compatible with all C compilers, including llvm	2019-05-08 09:20:18
<unic0rn>	in interactive mode, the code interacts with each other, and you don't wanna recompile everything on every single change. on the other hand, there's an issue of the stack and sharing it between words compiled separately	2019-05-08 09:20:20
	llvm is better suited for such thing, like compiling small fragments of code, the putting it together part is something i gotta research	2019-05-08 09:20:57
	many JIT compilers use llvm under the hood	2019-05-08 09:21:16
<Shelwien>	as i said, why would you need interactive mode? for debug?	2019-05-08 09:21:20
<unic0rn>	that's the power of forth, after all	2019-05-08 09:21:30
	yeah, debug	2019-05-08 09:21:33
<Shelwien>	yeah, but they're not stack-based usually	2019-05-08 09:21:39
<unic0rn>	if it's a matter of passing a stack pointer to such dynamically compiled words, i think it's fine	2019-05-08 09:22:21
	but there's also the more important question, is it really worth it at all	2019-05-08 09:22:40
	dealing with compilation on the fly, instead of just interpreting	2019-05-08 09:22:53
<Shelwien>	plain compiling yes, interactive compiling - no	2019-05-08 09:23:03
<unic0rn>	forth interpreter is extremely simple	2019-05-08 09:23:04
	well, it would be worth it if the words would be heavy	2019-05-08 09:23:17
	but this is forth, that's not the case	2019-05-08 09:23:22
<Shelwien>	can always add logging if you need to debug specifically the compiled code	2019-05-08 09:23:25
<unic0rn>	there would be a ton of word calls	2019-05-08 09:23:32
	doing little things	2019-05-08 09:23:39
	compiling them one by one makes little sense performance-wise	2019-05-08 09:23:51
<Shelwien>	yeah	2019-05-08 09:23:56
<unic0rn>	well, i'll keep interactive mode for sure.	2019-05-08 09:24:20
	it's fun and it's useful	2019-05-08 09:24:26
	it'll just be interpreted	2019-05-08 09:24:32
	so a lot slower than compiled mode, but should be fast enough to be useful for testing	2019-05-08 09:24:49
	i think i'll just add an option to compile some things, not all.	2019-05-08 09:25:12
	so basically it'll be possible to rebuild the whole interpreter with part of the code developer is working on, built in	2019-05-08 09:25:55
	as "works, do not touch"	2019-05-08 09:26:03
	leaving the parts to be modified, interpreted	2019-05-08 09:26:13
	and as for interactive mode, since words are usually doing small things, it's useful to be able to test them by hand with different inputs	2019-05-08 09:27:13
	the smaller the words are and the more words there are, the simpler it gets to trace a bug	2019-05-08 09:27:51
<Shelwien>	dunno, my C++ functions are usually also pretty simple	2019-05-08 09:28:07
	and I never use interactive debug, except for compiler bugs	2019-05-08 09:28:24
	just add logging or test scripts in the code	2019-05-08 09:28:55
<unic0rn>	well, the biggest word i've got in compression code, is 20 lines long i think	2019-05-08 09:29:27
	most of those are actually single words	2019-05-08 09:29:35
	of course, i also have a few onliners that are much more of a mess	2019-05-08 09:29:53
	but still, those are oneliners. short moment to write and debug, leave around as a black box that does its thing	2019-05-08 09:30:17
<Shelwien>	i use perl for these	2019-05-08 09:30:42
<unic0rn>	i think that's the idea with forth, when it comes to the more lowlevel code, being cut down into very small pieces	2019-05-08 09:30:45
	it's fast to write them and it's faster to rewrite them than to debug them	2019-05-08 09:31:00
<Shelwien>	its got regexp, and that's what is usually need	2019-05-08 09:32:21
	but i don't really see where forth would be the best tool	2019-05-08 09:32:52
	forth is easier to parse, but its not especially readable	2019-05-08 09:33:30
<unic0rn>	it can be readable, but it requires some time getting used to	2019-05-08 09:33:50
	then it's more readable than everything else i saw	2019-05-08 09:34:01
	but getting to writing such code takes time and determination. i've tried forth before. i even swore to never touch it again, because of the mess i've created with it	2019-05-08 09:35:09
	now i'm doing much better, still not optimal i think though	2019-05-08 09:35:34
<Shelwien>	for example: void put4( uint c, FILE* g ) { putc(c,g); putc(c>>8,g); putc(c>>16,g); putc(c>>24,g); }	2019-05-08 09:35:53
	how would that look in forth?	2019-05-08 09:36:01
<unic0rn>	: put4 ( c file -- ) here 4 allot rot over !	2019-05-08 09:43:42
	4 swap write ;	2019-05-08 09:43:43
	most likely like this	2019-05-08 09:43:49
	ah, forgot to deallocate	2019-05-08 09:43:58
	add -4 allot at the end	2019-05-08 09:44:10
	write is reva forth, ans forth has write-file, same thing only returns ioerror on the stack	2019-05-08 09:45:00
	reva has ioerr variable if one wants to check it	2019-05-08 09:45:11
<Shelwien>	uh, no, your version would be 100x slower :)	2019-05-08 09:45:26
<unic0rn>	both take address, byte count and fileid	2019-05-08 09:45:30
	not sure	2019-05-08 09:45:54
<Shelwien>	putc writes to memory buffer, its not a direct write call	2019-05-08 09:46:29
	also its written without a loop because of inlining	2019-05-08 09:47:10
	well, loop would be unrolled and inlined anyway, but loop syntax won't be shorter, but would be harder to read	2019-05-08 09:48:20
<unic0rn>	well, that's comparing apples to oranges. reva isn't the fastest one for sure. and neither reva afaik, nor ans forth, have such buffered calls. that being said, it's a single write in that code	2019-05-08 09:48:42
	unless you wanna call it in a loop	2019-05-08 09:48:56
	but noone sane will call unbuffered write in a loop	2019-05-08 09:49:09
	so yeah, direct translation without knowing the context, is a bad idea here	2019-05-08 09:49:34
<Shelwien>	well, i thought you'd use putc as another word	2019-05-08 09:49:35
<unic0rn>	could, went for lazy solution, wasn't sure what you're after	2019-05-08 09:50:03
<Shelwien>	so expected something like: c 8 shr g putc	2019-05-08 09:50:30
	which is hardly more readable than C :)	2019-05-08 09:50:56
	but i guess what you posted is an even better example in that sense :)	2019-05-08 09:51:14
<unic0rn>	it does what it needs within a single word. it isn't optimal	2019-05-08 09:51:44
	normally i would throw away all the allots and if it's to be used in a loop, redefined puts entirely, to write c into preallocated buffer and increase a pointer variable	2019-05-08 09:52:49
	wether it should check for the buffer size and flush automatically, or just expect manual flush, matter of taste, easy enough to do both	2019-05-08 09:53:30
<Shelwien>	well, that was presumed as things putc already does	2019-05-08 09:53:58
<unic0rn>	and "save this int to a buffer, increase pointer" is just bufptr @ ! bufptr 4 +!	2019-05-08 09:54:41
<Shelwien>	scary :)	2019-05-08 09:54:55
	its like this: http://nishi.dreamhosters.com/u/getcputc.inc	2019-05-08 09:55:00
<unic0rn>	yeah, basic oneliners in forth look like that. basic ones. once you've got abstracted all the lowlevel stuff, it's much better	2019-05-08 09:55:29
	yeah, kinda. but i won't bother to rewrite that :P	2019-05-08 09:56:05
	instead, here's for curiosity and some laughs, reva forth output	2019-05-08 09:56:25
	https://pastebin.com/raw/dfK6esZz	2019-05-08 09:56:29
<Shelwien>	uh, does it have an implicit loop there somewhere?	2019-05-08 09:57:57
	i presume "here" takes the ip?	2019-05-08 09:58:06
<unic0rn>	no, here returns a pointer to the end of the dictionary	2019-05-08 09:58:30
	new definitions go there, but you can also allocate temporary stuff there as needed	2019-05-08 09:59:04
	one thing allot does is change the address returned by here	2019-05-08 09:59:34
	not like words in general should use here like that, i just didn't want to introduce a variable	2019-05-08 10:00:20
<Shelwien>	ok	2019-05-08 10:00:21
	well, as expected, it would be hard to optimize for llvm	2019-05-08 10:00:35
	it could inline calls, but won't be able to reduce memory accesses	2019-05-08 10:01:06
<unic0rn>	but it would be. rot, over, stuff like that	2019-05-08 10:01:30
	it's all stack mangling	2019-05-08 10:01:34
	and those words do their own thing, separate from the others	2019-05-08 10:01:46
	when you take what they do combined, it can be optimized	2019-05-08 10:01:59
	especially when those as basic definitions get inlined	2019-05-08 10:02:09
	also, i could just use variable blahblah outside the word definition, replace here with blahblah and remove both allots	2019-05-08 10:02:53
<Shelwien>	well, compilers have a problem with memory access	2019-05-08 10:03:48
	first, if something is supposed to be stored in memory, they'd make sure it is stored there	2019-05-08 10:04:27
	i mean, even if it turned out useless in the end of optimizations	2019-05-08 10:04:47
	the idea is that it can be referenced elsewhere, and then optimizing it away would break the program	2019-05-08 10:05:37
<unic0rn>	that's a valid problem i guess, good you brought this up	2019-05-08 10:06:10
	but there's also a solution	2019-05-08 10:06:16
<Shelwien>	so given code like you posted, I'd expect llvm to inline the calls at best	2019-05-08 10:06:25
	all the memory accesses would remain the same, in same order	2019-05-08 10:06:39
<unic0rn>	llvm IR has like, infinite registers?	2019-05-08 10:06:44
<Shelwien>	https://en.wikipedia.org/wiki/Restrict	2019-05-08 10:06:59
	yes, but i don't think you could put stack into registers	2019-05-08 10:07:13
<unic0rn>	you could. compiler would just need to keep track of the stack balance. of course, that means unbalanced loops go out of the window, but those shouldn't be written anyway	2019-05-08 10:08:22
	then compiler would alias registers to stack cells during compilation	2019-05-08 10:09:00
	llvm would do the rest	2019-05-08 10:09:03
	not sure if that's the only solution in case of llvm, not sure how something like restrict would work here, will have to check	2019-05-08 10:09:35
	if i can tell llvm "you can do it", that's even better. but if not, it can be done with registers	2019-05-08 10:10:06
	it would be tricky though. not sure if llvm makes it possible to limit the scope of registers	2019-05-08 10:11:18
	basically, you would want either to pass parameters via some "parameter registers", and ignore the rest, because each word can be called from different places with different stack balance	2019-05-08 10:12:01
	which changes how registers translate to the stack	2019-05-08 10:12:10
	compiling them all separately and then linking is also possible, but that throws away inlining	2019-05-08 10:12:39
	but compiler can do inlining anyway, so llvm doesn't have to bother with that	2019-05-08 10:13:00
<Shelwien>	well, i looked it up	2019-05-08 10:13:27
	what about PICK and ROLL?	2019-05-08 10:13:35
<unic0rn>	unfamiliar. searched the IR docs, didn't see them. i saw stacker - a sample implementation of stack using such words, built on top of llvm	2019-05-08 10:18:03
	but i think using memory (or registers) is the only way to make sure reduntant things get optimized out	2019-05-08 10:18:24
	also, llvm has noalias	2019-05-08 10:18:29
	that should help with an array	2019-05-08 10:18:34
	so memory access should be fine	2019-05-08 10:19:24
	unless you meant forth words	2019-05-08 10:21:09
	and registers	2019-05-08 10:21:25
<Shelwien>	yeah, forth words which would be hard to implement on registers	2019-05-08 10:21:42
<unic0rn>	well, not hard. impossible, unless limited to immediate values	2019-05-08 10:22:13
	which is usually how they're used i guess	2019-05-08 10:22:36
<Shelwien>	i'm pretty sure there were more of these, especially some internal words used to implement loops and word definition	2019-05-08 10:23:11
<unic0rn>	a word should be interested in top 3, perhaps 4 at maximum, elements on the stack, so it's not like someone will use pick to treat the stack like a 1mb array	2019-05-08 10:23:54
	there are very few basic words needed	2019-05-08 10:24:47
	even less operate on the stack at all, that is, do something new with it, instead of using basic operations on top 3 or so elements	2019-05-08 10:25:27
<Shelwien>	btw, C also has static memory, dynamic allocation is slow and its better if there's none	2019-05-08 10:25:34
<unic0rn>	well, you don't allocate the stack dynamically	2019-05-08 10:26:09
	that is, you allocate it once and that's it	2019-05-08 10:26:27
<Shelwien>	well, in C I can precompute a log2(int) table and put it into a static array	2019-05-08 10:27:01
	what about forth?	2019-05-08 10:27:10
<unic0rn>	i guess it depends on the implementation. haven't seen that in ans forth	2019-05-08 10:27:46
<FunkyBob>	hrm... mornign thought... given a match, my code can easily implement a "find next match"... neat :)	2019-05-08 10:28:03
<Shelwien>	also, its pretty easy to break stack balance	2019-05-08 10:28:03
	like push a different number of vars onto stack in different branches	2019-05-08 10:28:27
	so i think it would be hard to keep all stack in registers	2019-05-08 10:29:50
	and while using registers for top 4 or so values may be more practical	2019-05-08 10:30:30
<unic0rn>	branches shouldn't be unbalanced, imho	2019-05-08 10:30:42
	at least they're not in my code	2019-05-08 10:30:58
	it would create a mess	2019-05-08 10:31:20
<Shelwien>	you'd have to still move them around, rather than moving stack pointer	2019-05-08 10:31:35
<unic0rn>	as for arrays, you can always use create and allot	2019-05-08 10:31:51
	to use the dictionary memory	2019-05-08 10:32:02
	that's obviously limited, but there's no problem in telling the compiler how much it should allocate for that	2019-05-08 10:32:33
	move them around? not really	2019-05-08 10:33:31
	when the compiler keeps track of the stack balance, it knows exactly where each element is	2019-05-08 10:33:44
	as i think about it, even inlining words wouldn't be a problem. it would have precalculated register indexes for each word already compiled	2019-05-08 10:34:08
	then, when inlining them, it would use those precalculated indexes as base indexes to modify	2019-05-08 10:34:26
<Shelwien>	for inlined code, sure	2019-05-08 10:34:44
<unic0rn>	so obviously, depending on how deeply some code would be nested, it would be multipass compilation	2019-05-08 10:34:55
	but pretty simple	2019-05-08 10:34:58
	well, my first idea for llvm was just to inline everything	2019-05-08 10:35:38
	and let llvm do the rest	2019-05-08 10:35:49
<Shelwien>	its not a bad idea, plenty of small C/C++ programs get fully inlined anyway	2019-05-08 10:36:20
<unic0rn>	it's not like translating forth words to llvm IR functions would be optimal anyway	2019-05-08 10:36:33
	those were designed with C in mind	2019-05-08 10:36:38
<Shelwien>	but its necessary to make it possible to still have uninlined words	2019-05-08 10:37:06
	because inlining only works as expected up to 32k of code	2019-05-08 10:38:46
<unic0rn>	i guess bigger problems arise from the dynamic nature of forth than that	2019-05-08 10:39:54
	words can create other words on the fly	2019-05-08 10:40:14
	there's create and does> after all	2019-05-08 10:40:25
	although it should be possible to workaround it	2019-05-08 10:41:24
<Shelwien>	well, if you can make interactive forth with llvm, you can deal with these too	2019-05-08 10:41:40
<unic0rn>	sure. in interactive mode.	2019-05-08 10:41:55
	thing is, how do you compile it	2019-05-08 10:42:02
	when you compile a word that creates another word	2019-05-08 10:42:13
<Shelwien>	with llvm dll?	2019-05-08 10:42:13
<unic0rn>	yeah, and inlining goes out of the window, so does the overall simplicity of the compiler design	2019-05-08 10:42:35
	but there's a reason create for example, takes as a parameter a string read from the source code	2019-05-08 10:43:02
<Shelwien>	well, 100% inlining requirement is too much of a restriction anyway	2019-05-08 10:43:03
<unic0rn>	not from the stack	2019-05-08 10:43:06
	so when compiling such words, they could be "preexecuted"	2019-05-08 10:44:11
	as a matter of fact, that's what immediate does	2019-05-08 10:44:19
	as for inlining, in case of registers it's possible to use some mangling/unmangling code for those corner cases that shouldn't be inlined	2019-05-08 10:45:21
	compiler then could return the stack-on-registers to its default indexing before calling such word	2019-05-08 10:46:04
	llvm would optimize it out anyway	2019-05-08 10:46:24
	so it's actually not a problem performance-wise to avoid inlining	2019-05-08 10:46:57
	since all such stack mangling operations would be optimized by llvm	2019-05-08 10:47:07
<Shelwien>	anyway, i still think that aside from plain math, any forth code with complex logic (branches, loops etc) won't be properly optimized	2019-05-08 10:47:21
<unic0rn>	it heavily depends on how such code is defined on the lowest level	2019-05-08 10:47:52
	different forth implementations vary	2019-05-08 10:47:58
	for example, some define create as immediate, some don't	2019-05-08 10:48:19
	loops can be done in many ways	2019-05-08 10:48:41
	at the very bottom, core words would just have llvm IR inlined, obviously	2019-05-08 10:49:00
	most likely in some macro form	2019-05-08 10:49:20
<Shelwien>	yes, but loops would still have to work with stack, or its not forth	2019-05-08 10:49:38
<unic0rn>	not sure what you mean	2019-05-08 10:50:11
<Shelwien>	limit index DO ... LOOP	2019-05-08 10:51:43
	loop control vars are on stack (also control stack likely)	2019-05-08 10:52:10
<unic0rn>	and? it's implementation dependent. some forths use very basic words, like 0branch i think, reva for example doesn't have that. i've just looked, it calls (while) which can't be decompiled, it's just a word that does something, hell knows what without checking the code, do and loop do several more things, won't bother analyzing. as i've said, it's all implementation dependend. reva's is most likely	2019-05-08 10:53:09
	not elegant at all	2019-05-08 10:53:09
	loop control vars are stored "somewhere"	2019-05-08 10:53:37
	doesn't matter where, forth programmer can't access them directly	2019-05-08 10:53:51
	can read them via i, j	2019-05-08 10:53:55
<Shelwien>	you can have nested loops	2019-05-08 10:54:03
	so you'd need control stack	2019-05-08 10:54:12
<unic0rn>	yeah, usually limited to i and j	2019-05-08 10:54:49
	2 elements	2019-05-08 10:55:01
	you hardly need a whole stack for that	2019-05-08 10:55:38
<Shelwien>	10 nested loops?	2019-05-08 10:55:43
<unic0rn>	haven't tried that.	2019-05-08 10:55:57
	but i guess you can use return stack for that	2019-05-08 10:56:46
	as a matter of fact, not sure if reva doesn't	2019-05-08 10:56:55
<Shelwien>	most likely it does	2019-05-08 10:57:04
<unic0rn>	return stack is mandatory anyway	2019-05-08 10:57:17
	and actually, floating point stack as well	2019-05-08 10:57:29
<Shelwien>	but if they're different stacks	2019-05-08 10:57:40
<unic0rn>	so yeah, when going with registers, 3 stacks would be needed	2019-05-08 10:57:48
<Shelwien>	you should be able to push numbers to stack in one loop	2019-05-08 10:57:57
	then drop them in another one	2019-05-08 10:58:03
	go map that to registers :)	2019-05-08 10:58:10
<unic0rn>	well, keeping main stack balanced is an acceptable requirement. but pushing stuff to floating point stack could be a problem. same with return stack. in general, the fact that stacks can be out of sync	2019-05-08 10:59:23
	the only way would be to have all stacks balanced at all times	2019-05-08 11:00:30
<Shelwien>	my point is, you most likely can make a forth compiler for your compressor	2019-05-08 11:00:37
	but as a forth compiler in general it would be hard to keep it efficient, while supporting all the features	2019-05-08 11:01:11
<unic0rn>	first i need to finish the compression	2019-05-08 11:01:45
<Shelwien>	ok :)	2019-05-08 11:01:52
<unic0rn>	but then, as i've said, llvm ir has noalias	2019-05-08 11:01:55
	keeping the stack in memory shouldn't be a problem for optimizations	2019-05-08 11:02:05
	in such case, registers aren't a problem	2019-05-08 11:02:20
<Shelwien>	it would anyway	2019-05-08 11:02:32
<unic0rn>	why?	2019-05-08 11:02:57
<Shelwien>	noalias just lets it be sure that some writing to memory via another pointer	2019-05-08 11:03:08
	won't affect this memory, so you don't have to re-read vars from it	2019-05-08 11:03:28
<unic0rn>	depending on how llvm's optimization works	2019-05-08 11:03:41
<Shelwien>	but it still would try to keep values in memory	2019-05-08 11:04:08
	unless it can optimize away the whole array	2019-05-08 11:04:18
<unic0rn>	my point is, it'll see that this is the only code writing to that memory, and that reading from it is also there, so when both operations are close to each other, unless some other piece of code reads from them without writing there first, yeah.	2019-05-08 11:05:08
	it should be able to optimize it out	2019-05-08 11:05:13
	and it has crazy number of optimizations passes	2019-05-08 11:05:35
<Shelwien>	tested it: https://godbolt.org/z/R6ld0z	2019-05-08 11:05:38
<unic0rn>	some recursive i think	2019-05-08 11:05:40
<Shelwien>	see its, a totally useless array	2019-05-08 11:06:27
	but once it can't drop it, it has to write totally useless values to it also	2019-05-08 11:06:48
	in that code, if we modify it to array[0] = 0;//_len;	2019-05-08 11:07:42
	it would drop the whole array and won't write to it	2019-05-08 11:07:58
	but that only happens when it understands the whole thing	2019-05-08 11:08:16
	and of course its not just because i used a volatile var there	2019-05-08 11:08:38
	its just an easy method to make a var, value of which compiler doesn't know	2019-05-08 11:09:02
<unic0rn>	thing is, that array isn't useless.	2019-05-08 11:17:30
<Shelwien>	another version: https://godbolt.org/z/nZQdSX	2019-05-08 11:18:19
	but it is	2019-05-08 11:18:30
	it doesn't affect anything at all	2019-05-08 11:18:48
<unic0rn>	true, my mistake.	2019-05-08 11:21:28
	i need another coffee i guess.	2019-05-08 11:21:34
	that being said	2019-05-08 11:21:36
<Shelwien>	yet another: https://godbolt.org/z/JmC56u	2019-05-08 11:21:45
	it previous clang was able to write 2 to array right away, now it can't	2019-05-08 11:22:07
<unic0rn>	i wonder what would happen with dynamically allocated array	2019-05-08 11:22:32
	passed to a function with __restrict__	2019-05-08 11:22:45
<Shelwien>	that'd be even worse	2019-05-08 11:22:45
	it won't be able to optimize it away	2019-05-08 11:22:57
	because alloc is an operation with side effects	2019-05-08 11:23:13
<unic0rn>	oh, it doesn't have to remove the array.	2019-05-08 11:23:44
	it just shouldn't bother accessing it.	2019-05-08 11:23:54
<Shelwien>	and if it doesn't remove it, it would have to keep it up to date	2019-05-08 11:24:07
<unic0rn>	what for?	2019-05-08 11:24:23
	if __restrict__ makes it realize that nothing else reads it?	2019-05-08 11:24:37
<Shelwien>	restrict only tells compiler that a write to this specific pointer doesn't affect any other memory	2019-05-08 11:25:13
	and as to why they keep useless arrays	2019-05-08 11:25:39
	its probably just for simplification of array tracking	2019-05-08 11:25:58
	otherwise they'd likely have to allocate a var descriptor per array element	2019-05-08 11:26:42
	and C++ compilers are already pretty slow as is	2019-05-08 11:26:55
	so i guess they decided to track them on array level	2019-05-08 11:27:06
	but yeah, its annoying as hell	2019-05-08 11:27:24
<unic0rn>	doesn't affect any other memory? isn't it the other way around, to guarantee that nothing else accesses the memory being accessed by said pointer?	2019-05-08 11:27:38
<Shelwien>	for example, making a macro with a list of vars	2019-05-08 11:27:49
	and passing the whole list around in function arguments	2019-05-08 11:28:00
<unic0rn>	so basically only the pointer is being used to access the array and nothing else	2019-05-08 11:28:04
<Shelwien>	can be much faster than using class methods or passing struct reference to functions	2019-05-08 11:28:27
	because individual fields are allocated to registers	2019-05-08 11:28:42
	while structs are not	2019-05-08 11:28:51
	---	2019-05-08 11:29:37
	see https://en.wikipedia.org/wiki/Restrict again	2019-05-08 11:29:41
	restrict keyword is used to determine if write with one pointer affects reads with all other pointers	2019-05-08 11:30:23
<unic0rn>	https://godbolt.org/z/d7UpSz	2019-05-08 11:35:36
	"It says that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points."	2019-05-08 11:36:39
	that is, nothing else with access that memory	2019-05-08 11:36:53
	which is why it can optimize it out	2019-05-08 11:37:04
	but compiling C is a mess, so i guess outside of __restrict__ it can't be sure about anything	2019-05-08 11:37:46
	with malloc in main, it won't work	2019-05-08 11:37:54
<Shelwien>	i looks like gcc and clang simply have special support for malloc	2019-05-08 11:38:29
	you can see that icc didn't optimize it away	2019-05-08 11:38:37
	and if instead of malloc, some wrapper is used...	2019-05-08 11:39:06
<unic0rn>	malloc just returns a pointer	2019-05-08 11:39:27
	or differently	2019-05-08 11:39:39
	that pointer is initialized within the function that says that only that pointer will be used to access that memory	2019-05-08 11:39:56
	so no matter what initialized that pointer, as long as it's within that function with __restrict__, it should work	2019-05-08 11:40:12
<Shelwien>	https://godbolt.org/z/--Iq0x	2019-05-08 11:40:37
<unic0rn>	doesn't matter if malloc spawns another thread reading from that memory over and over	2019-05-08 11:40:41
	__restrict__ says it won't	2019-05-08 11:40:46
<Shelwien>	here gcc still worked, clang didn't	2019-05-08 11:40:47
<unic0rn>	yeah, noinline	2019-05-08 11:41:49
<Shelwien>	in your version, gcc and clang just managed to drop the whole array, because they know how malloc works (explicit support)	2019-05-08 11:41:54
	well, compare gcc and clang code there	2019-05-08 11:42:02
<unic0rn>	by definition, __restrict__ works within a function	2019-05-08 11:42:07
	if the pointer gets initialized elsewhere, well	2019-05-08 11:42:15
<Shelwien>	but value returned from xmalloc is assigned to restrict pointer	2019-05-08 11:42:26
	so its ok	2019-05-08 11:42:27
<unic0rn>	well, it is stupid.	2019-05-08 11:42:58
	but i guess if xmalloc would be in a lib, then it could work	2019-05-08 11:43:07
<Shelwien>	sure	2019-05-08 11:43:17
<unic0rn>	still, in general it's possible.	2019-05-08 11:43:18
<Shelwien>	as i said, it seems they track arrays and structures per instance	2019-05-08 11:43:30
<unic0rn>	so llvm should be able to optimize out whole stack	2019-05-08 11:43:32
	as long as noalias is used	2019-05-08 11:43:43
<Shelwien>	so when they can understand the whole state of whole state of array/structure in a block	2019-05-08 11:44:05
	they can discard it	2019-05-08 11:44:15
	but if there's even one unknown value written to it	2019-05-08 11:44:41
	they have to maintain the whole array up to date	2019-05-08 11:44:57
	and yeah, it won't be able to optimize away whole stack	2019-05-08 11:45:40
	it only can happen when it understands its whole state	2019-05-08 11:45:54
<unic0rn>	it shouldn't have a problem with that	2019-05-08 11:46:08
<Shelwien>	so only for simply code that can be precalculated at compile time	2019-05-08 11:46:13
<unic0rn>	since everything, including core words, will be in a single llvm ir file	2019-05-08 11:46:27
	that is, recompiling the app will be recompiling whole implementation	2019-05-08 11:46:41
<Shelwien>	yes, but they don't analyse that deeply	2019-05-08 11:46:43
<unic0rn>	as i've said, llvm has multiple passes, some recursive	2019-05-08 11:46:58
	it may do it	2019-05-08 11:47:03
	and even if not	2019-05-08 11:47:05
	optimizing away redundant operations between words is enough	2019-05-08 11:47:21
	and gives a huge boost	2019-05-08 11:47:29
	also, "that deeply"	2019-05-08 11:48:07
	when most of the things will be inlined, that may just do the trick	2019-05-08 11:48:19
	also, consider that how forth works, depends on the implementation	2019-05-08 11:49:23
<Shelwien>	here: https://godbolt.org/z/zPC52x	2019-05-08 11:49:33
<unic0rn>	when inlining bigger (non-core) words, i don't have to specifically inline them	2019-05-08 11:49:38
	it's enough to keep them in the same function for llvm to analyze it all together	2019-05-08 11:49:55
	that is, assuming llvm ir has a jump that takes a parameter	2019-05-08 11:50:19
	so i can do jumps between locations of a function using return stack	2019-05-08 11:50:36
<Shelwien>	well, from these tests, it seems that gcc does much better	2019-05-08 11:50:47
	although it looks like malloc has better handling than c++ new somehow	2019-05-08 11:51:23
	but still, it only works when it understands the whole array state at the end of the function	2019-05-08 11:52:00
<unic0rn>	what you're doing is like probing compiler's eye with a stick and checking the response ;)	2019-05-08 11:52:42
<Shelwien>	no, i'm just too lazy to write complex code that it won't be able to comprehend	2019-05-08 11:53:09
<unic0rn>	i mean, (array - ((uint*)0)) % 3; - what sane person does that :P	2019-05-08 11:53:15
<Shelwien>	but such code is normal in any real project	2019-05-08 11:53:25
<unic0rn>	yeah, i know	2019-05-08 11:53:30
<Shelwien>	ok, let me write something different there	2019-05-08 11:53:33
<unic0rn>	it just looks funny	2019-05-08 11:53:34
<Shelwien>	ok, here: https://godbolt.org/z/6A5Z3v	2019-05-08 11:58:07
	that loop is enough to make compiler stop trying to understand what happens	2019-05-08 11:58:49
	and then it keeps the array and values in it	2019-05-08 11:59:01
	even though they're not necessary and output is a constant	2019-05-08 11:59:15
	as to ((uint*)0)	2019-05-08 12:02:01
	a "sane person" probably would write something like ptrdiff_t(array) instead	2019-05-08 12:02:36
	but *_t types are not fully portable	2019-05-08 12:03:03
	while (array - ((uint*)0)) has the same type without naming it	2019-05-08 12:03:37
<unic0rn>	i broke it	2019-05-08 12:05:33
	https://godbolt.org/z/X5nw7L	2019-05-08 12:05:35
<Shelwien>	well, bound checks don't exist in C/С++	2019-05-08 12:07:18
	as to long series of ADD in clang, its simply an optimization (loop unroll)	2019-05-08 12:08:18
	you can stop it by adding #pragma unroll(1) before loop	2019-05-08 12:08:26
<unic0rn>	yeah, i know	2019-05-08 12:08:34
	it's just "so you're trying to avoid memory access... well look at THIS"	2019-05-08 12:08:52
	made me chuckle	2019-05-08 12:09:03
	as for the loop in general, no idea what's happening	2019-05-08 12:10:05
<Shelwien>	its simply there to stop compiler tracking	2019-05-08 12:10:40
	compilers can optimize reasonably small expressions	2019-05-08 12:11:00
<unic0rn>	it's violating __restrict__, basically	2019-05-08 12:11:01
<Shelwien>	you mean out-of-bound access?	2019-05-08 12:11:25
<unic0rn>	yeah	2019-05-08 12:11:41
<Shelwien>	"If the declaration of intent is not followed and the object is accessed by an independent pointer, this will result in undefined behavior."	2019-05-08 12:11:51
<unic0rn>	basically, it has to be able to track the offset	2019-05-08 12:12:25
	to optimize it out	2019-05-08 12:12:30
<Shelwien>	it has to be able to track the whole array state	2019-05-08 12:12:50
<unic0rn>	with forth, that shouldn't be a problem	2019-05-08 12:12:54
	yeah	2019-05-08 12:12:55
<Shelwien>	well, it would be a problem once you have any loops or branches	2019-05-08 12:13:20
<unic0rn>	it doesn't have to optimize out return stack	2019-05-08 12:13:46
	that's separate	2019-05-08 12:13:55
<Shelwien>	that's unrelated	2019-05-08 12:13:57
<unic0rn>	as long as the loop is balanced, it shouldn't have a problem	2019-05-08 12:14:34
<Shelwien>	if control structure if sufficiently complicated, the compiler would be able to understand what it does	2019-05-08 12:14:37
	and once it can't, it would keep the whole array up to date	2019-05-08 12:15:01
	*won't be able	2019-05-08 12:15:22
	well, for programmers there's now a workaround - via constexpr	2019-05-08 12:16:26
<unic0rn>	well, it's a matter of keeping the stack operations predictable	2019-05-08 12:16:28
<Shelwien>	nope, that's unrelated to stack tracking	2019-05-08 12:16:52
<unic0rn>	that is, my guess is the only option would be to predefine some words for common operations	2019-05-08 12:16:53
	yes, it is.	2019-05-08 12:17:00
	because once you've got only direct offsets into memory array, it should optimize it out	2019-05-08 12:17:22
	things get problematic with stuff like pick	2019-05-08 12:17:34
	because it doesn't know which element you'll want to read	2019-05-08 12:17:48
<Shelwien>	as i said (and demonstrated), it doesn't optimize away an array where it doesn't know even one element value	2019-05-08 12:18:07
<unic0rn>	but that can be avoided, by defining words doing the same thing with few predefined offsets and scrapping the general case	2019-05-08 12:18:22
	not exactly following ans specification, but that may be the price	2019-05-08 12:18:36
	that's my point	2019-05-08 12:18:56
<Shelwien>	and when an element is set to a value which has sufficiently complex dependence chain	2019-05-08 12:18:59
	that's enough to stop it from optimizing away the array	2019-05-08 12:19:13
<unic0rn>	exactly. which is why each offset should be an immediate value	2019-05-08 12:19:27
<Shelwien>	simple branches would be okay	2019-05-08 12:19:29
	but a variable-size loop won't	2019-05-08 12:19:38
	same with some arithmetic operations like %	2019-05-08 12:20:01
<unic0rn>	unbalanced loops are out of the question, that much is obvious	2019-05-08 12:20:03
<Shelwien>	not unbalanced, just variable-length	2019-05-08 12:20:19
	or fixed-length, but long enough	2019-05-08 12:20:26
	for example, a loop like for( i=0,s=0; i<100; i++ ) s+=i;	2019-05-08 12:20:46
	modern compilers can turn into constant	2019-05-08 12:20:55
	because they have it in their pattern library	2019-05-08 12:21:07
	but anything non-trivial won't be tracked	2019-05-08 12:21:31
<unic0rn>	assuming s is top of the stack, that's just adding i to the top of the stack	2019-05-08 12:21:52
<Shelwien>	yes, and that one compilers would understand	2019-05-08 12:22:32
<unic0rn>	what i'm saying is, unless something is unbalanced, it's entirely predictable - and obvious - which parts are modified	2019-05-08 12:22:42
<Shelwien>	but how about this: for( i=0,s=0; s%100==0; i++ ) s+=i; ?	2019-05-08 12:22:48
	its not unbalanced, right?	2019-05-08 12:22:58
	well, in any case	2019-05-08 12:23:30
<unic0rn>	it's still just accessing top of the stack	2019-05-08 12:23:32
<Shelwien>	yes, but that's enough to trip the compiler	2019-05-08 12:23:45
<unic0rn>	maybe in C	2019-05-08 12:23:53
<Shelwien>	i mean by that that compiler won't be able to understand that the result of that is constant	2019-05-08 12:24:23
<unic0rn>	doesn't have to. it can execute the loop, that's not what i'm after	2019-05-08 12:24:47
	but it will optimize out the stack	2019-05-08 12:25:03
	s will be a register	2019-05-08 12:25:10
	i most likely too, because it shouldn't have trouble tracking the return stack, unless it becomes unbalanced somehow, and that shouldn't happen	2019-05-08 12:25:51
<Shelwien>	ah, that won't be a problem, sure	2019-05-08 12:26:20
<unic0rn>	precisely	2019-05-08 12:26:25
	and that's where the biggest performance gain is, imho	2019-05-08 12:26:35
<Shelwien>	but it won't optimize away the array	2019-05-08 12:26:37
	and then it would have to store element values to the array	2019-05-08 12:27:01
<unic0rn>	well, if it will optimize out the stack, then effectively it will optimize out its array	2019-05-08 12:27:20
<Shelwien>	so tight loops, yeah	2019-05-08 12:27:20
	but any complex code would still work with stack array all the time	2019-05-08 12:27:38
<unic0rn>	no reason.	2019-05-08 12:27:58
<Shelwien>	also as I already mentioned, you can't inline everything	2019-05-08 12:28:08
<unic0rn>	even complex code shouldn't keep much on the stack	2019-05-08 12:28:20
	or rather, shouldn't keep TOO MUCH	2019-05-08 12:28:26
<Shelwien>	at least the internal loop shouldn't be more than 32k of native code	2019-05-08 12:28:35
	because that's the size of L1 code cache	2019-05-08 12:28:46
	and once you have more than that, processing becomes 10x slower	2019-05-08 12:29:07
<unic0rn>	i'll see later when i'll be coding it, how things look with jumping around inside a single function	2019-05-08 12:29:22
	in llvm ir	2019-05-08 12:29:26
<Shelwien>	sure :)	2019-05-08 12:29:33
<unic0rn>	because as long as it's a single function, that would be perfect	2019-05-08 12:29:37
	doesn't mean i have to copy each word 10 times around	2019-05-08 12:29:44
	as for stack, even complex code accesses top 2-3 elements, sometimes 4	2019-05-08 12:30:38
	the rest is in allocated memory and global variables	2019-05-08 12:30:51
	there's really no need for very deep stack, as long as one can keep himself from abusing the stack, which happened to me in the beginning and i was easily loosing track of it	2019-05-08 12:31:46
	factoring helps	2019-05-08 12:32:06
	and factoring obviously doesn't make the stack more shallow, but then, how deeply can the code go	2019-05-08 12:32:23
<Shelwien>	well, i have 4k stack for coroutines, usually is enough	2019-05-08 12:32:44
<unic0rn>	i doubt my compression reaches 10 elements on the stack at any time	2019-05-08 12:33:15
	hell, i wouldn't be surprised if it was more around 5	2019-05-08 12:33:28
<Shelwien>	well, it only really accumulates with nesting depth	2019-05-08 12:34:15
<unic0rn>	yeah	2019-05-08 12:36:24
	but then, how much nesting can you do, in a single file	2019-05-08 12:37:04
<Shelwien>	i still don't understand the benefits of forth, though :)	2019-05-08 12:37:11
<unic0rn>	to be sure things get optimized properly though, i'll probably have to implement compiling separate files into separate objects	2019-05-08 12:37:34
	and linking it all together later	2019-05-08 12:37:39
<Shelwien>	%)	2019-05-08 12:37:45
<unic0rn>	because throwing everything together may be risky	2019-05-08 12:37:52
	as for benefits, how about keeping me from coding a groundbraking compression by talking about compilers ;)	2019-05-08 12:39:37
	thanks for all the insight by the way, very useful	2019-05-08 12:39:55
<Shelwien>	ok :)	2019-05-08 12:40:12
	groundbraking :)	2019-05-08 12:40:32
<unic0rn>	unfortunately, can't get back to coding right now, because of upcoming appointment, so i'll be back later	2019-05-08 12:40:37
	LOL	2019-05-08 12:40:39
	i didn't have time to grab another coffee and it shows :P	2019-05-08 12:41:00
<Shelwien>	well, there's https://handbrake.fr/	2019-05-08 12:41:42
<unic0rn>	yeah, i prefer ffmpeg. as for forth on llvm, i wonder how it'll do with generating gpu code. never played with programming those, i guess we'll see. but that has time, right now gotta go	2019-05-08 12:46:47
<Shelwien>	gpu code is basically the same as current x86	2019-05-08 12:47:56
	vectors, threads	2019-05-08 12:47:59
	so nothing that different, i think	2019-05-08 12:48:06
*** maniscalco_ has left the channel		2019-05-08 13:01:33
	!next	2019-05-08 14:40:43