*** maniscalco_ has joined the channel2019-05-07 21:32:03
*** Jibz has left the channel2019-05-07 22:13:24
*** maniscalco__ has joined the channel2019-05-08 00:57:34
*** maniscalco_ has left the channel2019-05-08 00:59:37
*** maniscalco__ has left the channel2019-05-08 01:01:53
*** maniscalco_ has joined the channel2019-05-08 01:54:51
<unic0rn> you've mentioned "intel" and "smart" in one sentence and channel died2019-05-08 04:51:38
 ;)2019-05-08 04:51:39
<Shelwien> well, did you see my benchmark stats for gcc9?2019-05-08 06:38:11
 for example, here's how gcc and icc PGO works:2019-05-08 07:08:06
 3.484s 3.047s: CMo8-gcc91-x64-SSE42019-05-08 07:08:09
 3.422s 3.094s: CMo8-gcc91-x64-SSE4-PGO2019-05-08 07:08:09
 3.437s 3.062s: CMo8-ic19-x64-SSE42019-05-08 07:08:09
 3.188s 2.781s: CMo8-ic19-x64-SSE4-PGO2019-05-08 07:08:09
<unic0rn> nope2019-05-08 07:36:16
 not saying it's not impressive, it's just that when i can avoid C/C++, i will2019-05-08 07:36:58
<Shelwien> i can't say i especially like it either2019-05-08 07:38:54
<unic0rn> for everything else i would rather choose llvm because of the many targets2019-05-08 07:38:55
<Shelwien> i've been programming is asm for a very long time2019-05-08 07:39:37
<unic0rn> well, C/C++ performs well, and it gives the programmers the tools their need. not surprising, considering the amount of C/C++ code around2019-05-08 07:39:44
 but that still doesn't make the language good per se2019-05-08 07:39:56
<Shelwien> made my own DPMI framework for DOS2019-05-08 07:40:12
<unic0rn> ah, dpmi. good old times.2019-05-08 07:40:27
<Shelwien> and basically a custom asm language via tasm macros2019-05-08 07:40:28
<unic0rn> "I CAN HAS 32BIT FLAT MODEL NOW?"2019-05-08 07:40:35
<Shelwien> DPMI wasn't flat, you could use segment registers there2019-05-08 07:40:59
 thing is, intel actually provided a perfect solution for memory allocation in hardware2019-05-08 07:41:23
<unic0rn> yeah, but iirc, it was used with flat model usually2019-05-08 07:41:29
 also, i recall switching flat model alone was dead simple2019-05-08 07:41:40
<Shelwien> via virtual memory and segments2019-05-08 07:41:41
 with selector:offset pointers2019-05-08 07:42:04
 you can always transparently reallocate the memory block it references2019-05-08 07:42:18
 also virtual memory allows to only copy one page, the full pages can be simply remapped to different addr2019-05-08 07:42:52
 and selector can be transparently reassigned to different base addr which fits the reallocated memory block2019-05-08 07:43:20
 so there was a perfect solution to dynamic memory usage2019-05-08 07:43:41
<unic0rn> back then, i was mostly toying with amiga 600 and m68k2019-05-08 07:43:45
 and all the amiga hardware. had a copy of a perfect book, describing every single bit in it2019-05-08 07:44:09
 so i've made custom video mode that used all the available dma cycles for copper for dynamic palette changes for photo display, so without additional - fast - memory, cpu was able to read new instructions only during vblank2019-05-08 07:44:55
 fun times2019-05-08 07:45:10
<Shelwien> yeah2019-05-08 07:45:37
 i also made a video mode editor for PC2019-05-08 07:45:44
<unic0rn> never really published it though. it was just personal toy, amiga was half-dead by then anyway2019-05-08 07:46:00
 or actually mostly dead2019-05-08 07:46:13
 not sure if x-mode was more flexible than what amiga did. possibly yes, in frequency terms.2019-05-08 07:47:40
 it still had a few nice tricks though. using overscan area on tv, and switching between 50 and 60hz iirc2019-05-08 07:48:05
 interlace modes in 60hz were actually acceptable2019-05-08 07:48:23
 that reminds me of working with video toaster on pc later on2019-05-08 07:49:27
<Shelwien> the text mode on PC was actually an equivalent of 720x400 16-color graphics2019-05-08 07:49:36
 which wasn't available normally2019-05-08 07:50:01
 so it wasn't possible to make a natural-looking textmode screenshot2019-05-08 07:50:21
<unic0rn> text modes..2019-05-08 07:50:52
<Shelwien> it was also a good thing2019-05-08 07:51:20
<unic0rn> i spent a lot of time with linux and freebsd, but as soon as it became possible, i've started using framebuffers for text consoles2019-05-08 07:51:31
<Shelwien> i ended up having to simulate it when writing this: https://github.com/Shelwien/cmp2019-05-08 07:51:41
 and apparently fast character generation is not an easy task2019-05-08 07:52:04
<unic0rn> it is not. especially today, with all the fancy, vector, antialiased fonts2019-05-08 07:52:53
<Shelwien> first, i had to dump system font to bitmap, since using winapi directly was too slow2019-05-08 07:53:02
 and then there was a problem with coloring it2019-05-08 07:53:22
<unic0rn> it's writing a custom terminal emulator, basically, or rather, just the display part of it2019-05-08 07:53:54
 btw, are you familiar with zx spectrum video memory?2019-05-08 07:54:11
<Shelwien> that is, simply expanding a character from bitmap to specified fore- and background colors2019-05-08 07:54:27
 was also visibly slow2019-05-08 07:54:36
 i ended up having to cache already colored symbols2019-05-08 07:55:04
 as to zx, yes, it was annoying2019-05-08 07:55:39
<unic0rn> lets not forget the extended 512x192 timex mode. actually, mine had additional one, 256x192 but with 8x1 pixel attributes2019-05-08 07:56:21
 never saw a single piece of software using that one2019-05-08 07:56:31
 but mine was damaged anyway, it was displaying only half of the colors, so i didn't bother experimenting that much. if not for the fact i was just a kid, i could perhaps fix it, assuming it was a single bit not going through from the memory to ULA2019-05-08 07:58:19
<Shelwien> btw, digitized music also worked in similar ways at that time2019-05-08 07:58:33
 it was possible to record a song and then play it2019-05-08 07:59:05
<unic0rn> the audio clock on amiga... that was a mess2019-05-08 07:59:20
<Shelwien> but it also requred full cpu capacity2019-05-08 07:59:32
 because 22khz *8 for 1bit beeper...2019-05-08 07:59:45
<unic0rn> yeah, on speccy2019-05-08 08:00:07
<Shelwien> and also data compression was pretty important2019-05-08 08:00:08
<unic0rn> not sure if people didn't try sample playback via white noise or something, on AY2019-05-08 08:00:23
<Shelwien> so we had a few good-quality songs2019-05-08 08:00:33
<unic0rn> i know i've experimented with 14bit (or was it 12, i don't remember) audio on amiga2019-05-08 08:00:46
<Shelwien> as their own programs, which couldn't do anything else2019-05-08 08:00:47
<unic0rn> because the 8bit resolution was subject to 6bit i think volume adjustment2019-05-08 08:01:15
 in hardware2019-05-08 08:01:22
 so you could have 2 channels with higher resolution instead of 2 with lower2019-05-08 08:01:41
 4 with lower*2019-05-08 08:01:47
 that was used a lot actually i think2019-05-08 08:02:10
*** Jibz has joined the channel2019-05-08 08:02:11
<Shelwien> yeah2019-05-08 08:02:34
<unic0rn> as for music, well. the king of the hill was the .mod format for a time2019-05-08 08:02:37
<Shelwien> true, but that was less interesting2019-05-08 08:03:05
<unic0rn> yeah. everything breaking the limits was interesting back then2019-05-08 08:03:30
 after that, came svga and sound blaster2019-05-08 08:03:40
<Shelwien> since you can't just make one for a given song without musical background2019-05-08 08:03:43
<unic0rn> and all that was left was MOAR CYCLES2019-05-08 08:03:49
<Shelwien> while even on 6502 it was possible to digitize a normal song via LPT port2019-05-08 08:04:04
<unic0rn> well, and voodoo graphics2019-05-08 08:04:05
<Shelwien> yeah, and it all became boring2019-05-08 08:04:23
 so i mostly switched to compression2019-05-08 08:04:31
<unic0rn> well, there's raytracing now. and voxels. and GPUs finally capable of rendering, or even raytracing, voxels, even without all that RTX stuff from nvidia2019-05-08 08:05:28
 on 3D graphics front, it's interesting, just way way more complicated than it was before2019-05-08 08:05:54
<Shelwien> its only interesting to watch2019-05-08 08:06:31
<unic0rn> similar with gamedev in general. it may be sad to not be able to push the limits of entirely custom hardware like before, but now that even consoles use standard solutions, one can focus on the software side of the problem, instead of working out the quirks of the hardware2019-05-08 08:07:21
<Shelwien> i'm not an artist, so i can't provide nice models for modern 3D2019-05-08 08:07:24
 and even properly using existing APIs and hardware is already pretty hard2019-05-08 08:08:14
<unic0rn> 3D modelling isn't hard, as long as you've got some drawings/photos as a guide2019-05-08 08:08:18
<Shelwien> i know, i can do it2019-05-08 08:08:38
<unic0rn> well, the APIs are a mess at times2019-05-08 08:08:46
<Shelwien> its just not interesting for me, i'm a programmer2019-05-08 08:08:47
 but there's not much interesting work for a programmer in modern 3D2019-05-08 08:10:09
<unic0rn> the biggest mess for now is i guess the fact that windows API is object oriented. with C++ compilers, it plays nicely... more or less.2019-05-08 08:10:12
<Shelwien> well, runtime 3D2019-05-08 08:10:18
 its just struggling to find workarounds to make weird APIs do what you want2019-05-08 08:10:48
<unic0rn> runtime? considering how raytraced voxels haven't made their way into the gaming world properly just yet, and at the same time, are already possible on current hardware, i would say there's a lot of interesting work to be done2019-05-08 08:11:02
 a lot of it on the gpu side of things2019-05-08 08:11:25
<Shelwien> well, its too complicated2019-05-08 08:11:44
 a very limited number of people can actually contribute2019-05-08 08:11:58
<unic0rn> it's not really rocket science, but it takes some learning2019-05-08 08:12:11
<Shelwien> because they have access to all required documentation and tools2019-05-08 08:12:15
 and know how to use them2019-05-08 08:12:30
<unic0rn> there are examples of raytracing voxels on gpus, with source i guess, or at least very detailed explanation2019-05-08 08:12:32
<Shelwien> i mean2019-05-08 08:12:51
 what's the point of just making a demo showing voxels?2019-05-08 08:13:09
 there was a "mars" demo on DOS which did that2019-05-08 08:13:22
<unic0rn> it isn't really using any hard tricks on the gpu side afaik. now, i didn't experiment with it myself, but i've read a few articles on blogs and saw a few things2019-05-08 08:13:25
<Shelwien> without GPU even2019-05-08 08:13:29
<unic0rn> well, there was 64k intro with software raytracing in the DOS era2019-05-08 08:13:46
 and no, demo makes little sense2019-05-08 08:13:56
 but adding raytraced voxels to some existing engine, like godot, that's something else2019-05-08 08:14:19
 but a lot of interesting stuff is going on on the OS/toolchain front right now, that's also true. flutter for android/iOS/web and desktop possibly/and in the end, for fuchsia2019-05-08 08:15:48
 the whole fuchsia OS2019-05-08 08:15:51
 i wonder where google is going with it2019-05-08 08:16:16
 hopefully to replace both android and chromeos2019-05-08 08:16:29
 and well, the whole "lets invent new language" thing, so popular in recent years. there's kotlin for java diehard fans, there's rust, winning over a lot of C/C++ people2019-05-08 08:18:09
 and it's never been easier to create something new in that regard, thanks to llvm2019-05-08 08:18:28
<Shelwien> rust uses llvm to compile2019-05-08 08:18:36
<unic0rn> yeah, i know.2019-05-08 08:18:43
<Shelwien> so it can't yet compete with C/C++ at performance2019-05-08 08:18:49
<unic0rn> depending on the code, most likely. programmers good at C/C++ are good enough to throw an intrinsic or two at a performance critical function2019-05-08 08:19:33
 sure, same less-optimized-by-hand code, will be slower2019-05-08 08:20:00
<Shelwien> well, no point2019-05-08 08:20:15
<unic0rn> but i guess there are other features of the language that make up for that2019-05-08 08:20:19
<Shelwien> rust only provides more work for programmer, to work around its strictness2019-05-08 08:20:57
<unic0rn> sometimes it's strictness that people are after2019-05-08 08:21:19
 one-man's-army's tool may be team's horror show2019-05-08 08:21:38
<Shelwien> in theory, i understand that side of language development2019-05-08 08:21:51
<unic0rn> i prefer to work with my code alone2019-05-08 08:21:59
<Shelwien> yeah2019-05-08 08:22:12
<unic0rn> but i can understand why some people prefer the language taking care of some things and limiting the resulting clusterf..2019-05-08 08:22:18
<Shelwien> i don't like when i have to convince the compiler to do what i want2019-05-08 08:22:33
<unic0rn> it's why i like forth so much2019-05-08 08:22:33
 it puts no limits2019-05-08 08:22:40
<Shelwien> rather than just doing what i asked2019-05-08 08:22:53
<unic0rn> C compiler will track a lot of things and complain, usually rightfully so, but not always, sometimes you know what you're doing, but the compiler doesn't and you need to tell it explicitly2019-05-08 08:23:36
 forth has no such overhead. either you know what you're doing, or it'll crash2019-05-08 08:23:59
<Shelwien> well, one example of what i'm talking about2019-05-08 08:24:15
 C++ has templates2019-05-08 08:24:23
 they're kinda like macros2019-05-08 08:24:37
<unic0rn> you really do love your... oh wait. we've been here before.2019-05-08 08:25:03
<Shelwien> you can substitute a constant or a type into a piece of code2019-05-08 08:25:03
 well, its very very useful, i'd use the same approach even without templates2019-05-08 08:25:46
 its possible to simulate with macros and #include2019-05-08 08:26:00
<unic0rn> well, lispers love lisp for its macros for a reason2019-05-08 08:26:05
 except i guess macros in lisp can do a lot more2019-05-08 08:26:20
<Shelwien> my point is different here2019-05-08 08:26:20
 so, there's this useful language feature2019-05-08 08:26:29
 and with VS/IntelC/clang I can just use it2019-05-08 08:26:50
 that is, usually I'd have a normal function first2019-05-08 08:27:22
 then I'd need multiple instances of it, for example a version for decoding and another for encoding2019-05-08 08:28:01
 so normally I can just add a template< int f_DEC > line before the function declaration2019-05-08 08:28:34
 and that's it, f_DEC would behave like a macro constant2019-05-08 08:28:48
 and compiler would generate two instances of function code2019-05-08 08:29:06
 each separately optimized 2019-05-08 08:29:28
 like, branches only necessary for other modes would disappear etc2019-05-08 08:29:47
 and as I said, it works with 3 difference C++ compilers2019-05-08 08:30:10
 and then there's gcc2019-05-08 08:30:18
 where developers decided that somehow at this specific point they want hard standard compliance2019-05-08 08:30:50
 so, when a template has a class parameter2019-05-08 08:31:25
 gcc doesn't want to see the contents of that class2019-05-08 08:32:00
 so I have to add using directives for each "imported" name2019-05-08 08:32:32
 just for gcc2019-05-08 08:32:35
 like this: https://github.com/Shelwien/stegdict/blob/master/model.inc#L62019-05-08 08:32:53
 that's the compiler strictness i mean2019-05-08 08:33:23
 and the actual reason for it2019-05-08 08:33:40
 is that C++ standard group2019-05-08 08:33:55
 wanted to implement modular complication for templates, like usual .c/.h approach2019-05-08 08:34:36
 in which case, yeah, the compiler won't see the contents of parameter class, because its defined in other module2019-05-08 08:35:28
 but its only what they wanted2019-05-08 08:35:49
 they actually failed to do it, there's no C++ compiler which allows to split template declarations and implementations2019-05-08 08:36:34
 templates are just fully defined in header files usually, along with all the code2019-05-08 08:37:10
 and still, gcc forced me to write extra declarations, just for it2019-05-08 08:37:33
<unic0rn> do they implement it in all standards, or just gnu?2019-05-08 08:38:30
 because if in all, that's just stupid2019-05-08 08:38:43
 i mean, specs are one thing, but the reality is another2019-05-08 08:38:58
<Shelwien> its specified in the main C++ standard2019-05-08 08:39:21
<unic0rn> yeah, i know2019-05-08 08:39:28
<Shelwien> so gcc follows it in all modes2019-05-08 08:39:29
<unic0rn> that's silly. if no other compiler implements it, that's silly2019-05-08 08:40:07
<Shelwien> comparing to that, MS/VS approach to C++ is more user-friendly2019-05-08 08:40:20
 its not just this2019-05-08 08:40:33
 there're lots of features, where MS syntax extensions are something that is useful for a programmer2019-05-08 08:41:01
<unic0rn> the whole GNU-everything is rarely user friendly tbh. it tries to be, but the priority is always being friendly to the ideology and doing the right thing when it comes to the specs i guess2019-05-08 08:41:23
<Shelwien> like access operators for class fields, or various pragmas2019-05-08 08:41:46
 while gcc design choices are usually made based on some abstract concepts2019-05-08 08:42:20
 like, "the theory for this is more interesting"2019-05-08 08:42:50
 and then i have to look for workarounds2019-05-08 08:43:01
<unic0rn> well, extensions in general are another thing. imho it's the language that should define everything that is needed. not even saying about compiler pragmas, because that's another thing, but stuff like intrinsics. "will it work on this compiler? does it differ? if so, how?" why the hell can't be have such things standardized properly2019-05-08 08:44:08
<Shelwien> intrisics are another fun thing2019-05-08 08:44:49
 they're barely (and only partly) documented2019-05-08 08:45:03
 so hard to expect much :)2019-05-08 08:45:13
<unic0rn> they shouldn't match the platform in the first place2019-05-08 08:45:21
 compiler should take care of mapping that2019-05-08 08:45:30
<Shelwien> the most interesting example for me2019-05-08 08:45:35
<unic0rn> you should just work with vectors, without giving a damn if it'll be compiled for x86 or arm2019-05-08 08:45:53
<Shelwien> was when i found - while reading a fixed bug list for a new version2019-05-08 08:46:06
 that IntelC apparently supports gnu inline asm2019-05-08 08:46:18
 i mean, it also supports MS inline asm, and that's actually documented2019-05-08 08:46:51
 while gnu syntax support is undocumented and limited2019-05-08 08:47:12
 for example, it only supports AT&T asm syntax (in intel compiler!)2019-05-08 08:47:28
 and doesn't support named labels2019-05-08 08:47:36
<unic0rn> someone had to figure out it'll be useful for some customers.2019-05-08 08:47:37
<Shelwien> well, it is very useful2019-05-08 08:47:47
<unic0rn> hint word: customers.2019-05-08 08:47:51
 gcc doesn't have such concept.2019-05-08 08:47:58
<Shelwien> MS-style asm can't be used these days2019-05-08 08:47:59
 because even if you'd write a highly optimized function in asm2019-05-08 08:48:37
 (doesn't matter if its inline asm or separately compiled)2019-05-08 08:48:54
 when using that function, compiler automatically has to assume2019-05-08 08:49:22
 that it would mess up all the registers2019-05-08 08:49:32
 and memory references by all pointers2019-05-08 08:49:53
 so very frequently, translating parts of code to asm doesn't help2019-05-08 08:50:38
 you'd either have to rewrite the whole internal loop in asm2019-05-08 08:50:54
<unic0rn> that's the problem with C/C++ compilers.2019-05-08 08:50:58
 they have to assume a whole lot2019-05-08 08:51:05
<Shelwien> well, gnu inline asm actually provides a solution to that2019-05-08 08:51:12
<unic0rn> throw a wrench inside and they've got a problem2019-05-08 08:51:20
<Shelwien> it has an explicit syntax for what asm code's inputs and outputs2019-05-08 08:51:41
<unic0rn> marking registers?2019-05-08 08:51:43
 yeah2019-05-08 08:51:46
<Shelwien> and what other stuff it modifies2019-05-08 08:52:03
 so this kind of inline asm does improve performance2019-05-08 08:52:22
 unlike externally defined or MS-inline2019-05-08 08:52:35
 and it exists in IntelC2019-05-08 08:52:52
 while still being undocumented :)2019-05-08 08:53:11
<unic0rn> i guess i'll think about it more if i'll be writing my own forth-to-x86-64 compiler. llvm target is the primary idea, so that may take some time2019-05-08 08:53:18
<Shelwien> https://encode.ru/threads/418-Inline-assembly-routines-for-paq8?p=8559&viewfull=1#post85592019-05-08 08:53:54
<unic0rn> reva forth i'm using, isn't doing any optimizations. it's brain dead in that regard. but it's simple and compact.2019-05-08 08:53:56
 if you'll define any word, it'll put a call to it, unless it's marked as inline2019-05-08 08:54:21
<Shelwien> well, 40k is not that compact2019-05-08 08:54:23
<unic0rn> many builtin words are inline, so there's that2019-05-08 08:54:39
 but still, stuff like 2 << gets compiled into "push 2 on the stack" - that's 3 asm instructions2019-05-08 08:55:03
<Shelwien> with C++, you can expect a 3-4k exe file for a simple cmdline utility2019-05-08 08:55:19
<unic0rn> then <<, which loads the parameter from the stack2019-05-08 08:55:26
 and does shl2019-05-08 08:55:33
 i've just replaced all that crap with 3byte inline definition in asm, shl eax, 22019-05-08 08:55:55
 because top of the stack is always in eax2019-05-08 08:56:03
<Shelwien> :)2019-05-08 08:56:11
<unic0rn> but that brings me to possible optimizations, in forth it could be very simple, but then doing inline stuff could become problematic as well, if not downright impossible without disabling those optimizations locally2019-05-08 08:56:59
 marking input and output counts for forth words (usually done with comments anyway)2019-05-08 08:57:27
 and especially, marking the location of inputs in core words, like <<2019-05-08 08:57:56
 if it would be treated like a macro, shl eax, %input12019-05-08 08:58:15
<Shelwien> well, that's why i'm not that excited about forth now2019-05-08 08:58:26
<unic0rn> then it would be trivial for the compiler to optimize it2019-05-08 08:58:31
<Shelwien> though i did use it at some point2019-05-08 08:58:37
 its good to know that this kind of approach exist2019-05-08 08:59:04
<unic0rn> well, it's not really that different from C/C++ world in that regard. you gotta go through some hops to not mess with optimizations done by the compiler with your own assembly code2019-05-08 08:59:11
 unless the compiler will try to parse it all to understand it2019-05-08 08:59:21
<Shelwien> yes, but forth is also fully stack-based2019-05-08 08:59:29
 and while simple arithmetics are usually ok2019-05-08 08:59:42
<unic0rn> that's why i'm willing to target llvm. will see what it'll generate at the assembly level2019-05-08 08:59:59
<Shelwien> there're always some operators which do complicated things with stack2019-05-08 09:00:29
<unic0rn> but it should be able to use much more registers than a regular forth compiler. of course, one can write such optimizations by hand, but what's the point if llvm does that already2019-05-08 09:00:39
<Shelwien> yes2019-05-08 09:00:54
 like i'm saying, normal math can be translated normally2019-05-08 09:01:11
 but branches, loops and such2019-05-08 09:01:23
 are usually implemented in forth too2019-05-08 09:01:50
 they don't use some non-forth logic2019-05-08 09:01:58
<unic0rn> yeah, but the 90% of stuff that ends up being redundant, is just there because of the use of the stack. if llvm can figure it out and throw away whole bunch of code, the end result should be fast2019-05-08 09:02:01
<Shelwien> well, i don't think so2019-05-08 09:02:24
 any kind of operation that does indirect access to stack2019-05-08 09:02:42
<unic0rn> branches are simple2019-05-08 09:02:42
<Shelwien> like access to n'th word indexed by a variable2019-05-08 09:03:07
 and llvm won't save you2019-05-08 09:03:15
<unic0rn> if it's indexed by top of the stack, that should be single asm instruction2019-05-08 09:03:35
<Shelwien> sure2019-05-08 09:03:55
 what i mean, it won't be merged with anything2019-05-08 09:04:07
<unic0rn> in case of reva, mov eax, [esi+eax*4]2019-05-08 09:04:09
<Shelwien> so while C for loop can be vectorized etc2019-05-08 09:04:25
 you won't have that with forth2019-05-08 09:04:35
<unic0rn> that's also not entirely true i think2019-05-08 09:04:57
<Shelwien> i'm not talking about just translating it to C/asm/llvm and making that work2019-05-08 09:05:01
<unic0rn> i was thinking about that some time ago2019-05-08 09:05:07
<Shelwien> but stack operations can be hard to optimize2019-05-08 09:05:14
<unic0rn> if each word has defined inputs and outputs, that is, stack balance2019-05-08 09:05:26
<Shelwien> its kinda simular to refactoring of recursive functions2019-05-08 09:05:30
<unic0rn> then the compiler can keep track of it2019-05-08 09:05:37
 yeah, recursion is a bad case for that2019-05-08 09:05:52
 but regular loops should keep their stack balanced2019-05-08 09:06:05
<Shelwien> well, whole forth is like recursion in a way2019-05-08 09:06:11
<unic0rn> if they don't, they're badly written2019-05-08 09:06:17
<Shelwien> because functions in C can't leave stack unbalanced2019-05-08 09:06:30
<unic0rn> well, it's a different stack.2019-05-08 09:06:50
 there's separate return stack after all, and i wouldn't want to mess the balance of that one2019-05-08 09:07:12
<Shelwien> sure, i mean, you can't map forth words to C functions2019-05-08 09:07:30
<unic0rn> not easily, no2019-05-08 09:07:57
 well, unless a word returns just a single value, then it's very simple2019-05-08 09:08:37
 funny thing, that's the case with my current compression code2019-05-08 09:09:18
 if a word returns anything at all, it's a single value, never more iirc2019-05-08 09:09:29
 but when it comes to constants and global variables, there are over 20 of those altogether2019-05-08 09:10:39
<Shelwien> https://github.com/riywo/llforth :)2019-05-08 09:10:46
<unic0rn> i saw that one2019-05-08 09:11:06
 didn't test it, looks very basic2019-05-08 09:11:11
 i'm looking to write something more serious2019-05-08 09:11:26
<Shelwien> maybe you can test it with your compressor?2019-05-08 09:11:31
 if it improves speed, that'd be good?2019-05-08 09:11:38
<unic0rn> mostly compliant with ANS forth and capable of doing library calls, so for example basic opengl app should be a no brainer2019-05-08 09:11:53
 yeah, that's the idea, when i'm done with coding the compression :P2019-05-08 09:12:05
 i also plan to write it in pretty modular way2019-05-08 09:12:51
 so when the llvm target won't work the way i wanted, i'll just add x86-64 target2019-05-08 09:13:19
 of course it'll be missing the llvm optimizations on the beginning2019-05-08 09:13:41
 but it won't mean the whole code will go out of the window2019-05-08 09:13:57
<Shelwien> hm, the book that it mentions: http://download.library1.org/main/2065000/a76c11fd8609daa1fe299009a8e83a55/Igor%20Zhirkov%20%28auth.%29%20-%20Low-Level%20Programming_%20C%2C%20Assembly%2C%20and%20Program%20Execution%20on%20Intel%C2%AE%2064%20Architecture-Apress%20%282017%29.pdf2019-05-08 09:14:15
<unic0rn> nice one2019-05-08 09:15:26
 as for llvm, i'll have to research it a bit for that implementation to be optimal2019-05-08 09:16:48
 i mean, i could do two modes, compile core words + repl into llvm IR and as a result, to executable, load everything else on the fly and interpret, and second mode - release - noninteractive, compile everything to llvm IR and into executable2019-05-08 09:17:37
 but that isn't optimal2019-05-08 09:17:42
<Shelwien> yeah, need a real compiler2019-05-08 09:18:13
<unic0rn> it should be able to compile definitions on the fly, so it would have to use llvm library for that2019-05-08 09:18:13
 and that's a potential problem actually, depending on what can be done with llvm exactly. most likely everything, but then such interactive mode will go deeper into lowlevel stuff than actuall compilation in release mode2019-05-08 09:19:16
<Shelwien> interpreter is only necessary for reflection and dynamic eval2019-05-08 09:19:25
<unic0rn> because in release mode, you just throw IR at llvm, "do your thing"2019-05-08 09:19:26
<Shelwien> but where would you use that normally?2019-05-08 09:19:37
 well, rather than llvm, I'd just generate C code on output2019-05-08 09:20:04
 that would make it compatible with all C compilers, including llvm2019-05-08 09:20:18
<unic0rn> in interactive mode, the code interacts with each other, and you don't wanna recompile everything on every single change. on the other hand, there's an issue of the stack and sharing it between words compiled separately2019-05-08 09:20:20
 llvm is better suited for such thing, like compiling small fragments of code, the putting it together part is something i gotta research2019-05-08 09:20:57
 many JIT compilers use llvm under the hood2019-05-08 09:21:16
<Shelwien> as i said, why would you need interactive mode? for debug?2019-05-08 09:21:20
<unic0rn> that's the power of forth, after all2019-05-08 09:21:30
 yeah, debug2019-05-08 09:21:33
<Shelwien> yeah, but they're not stack-based usually2019-05-08 09:21:39
<unic0rn> if it's a matter of passing a stack pointer to such dynamically compiled words, i think it's fine2019-05-08 09:22:21
 but there's also the more important question, is it really worth it at all2019-05-08 09:22:40
 dealing with compilation on the fly, instead of just interpreting2019-05-08 09:22:53
<Shelwien> plain compiling yes, interactive compiling - no2019-05-08 09:23:03
<unic0rn> forth interpreter is extremely simple2019-05-08 09:23:04
 well, it would be worth it if the words would be heavy2019-05-08 09:23:17
 but this is forth, that's not the case2019-05-08 09:23:22
<Shelwien> can always add logging if you need to debug specifically the compiled code2019-05-08 09:23:25
<unic0rn> there would be a ton of word calls2019-05-08 09:23:32
 doing little things2019-05-08 09:23:39
 compiling them one by one makes little sense performance-wise2019-05-08 09:23:51
<Shelwien> yeah2019-05-08 09:23:56
<unic0rn> well, i'll keep interactive mode for sure.2019-05-08 09:24:20
 it's fun and it's useful2019-05-08 09:24:26
 it'll just be interpreted2019-05-08 09:24:32
 so a lot slower than compiled mode, but should be fast enough to be useful for testing2019-05-08 09:24:49
 i think i'll just add an option to compile some things, not all.2019-05-08 09:25:12
 so basically it'll be possible to rebuild the whole interpreter with part of the code developer is working on, built in2019-05-08 09:25:55
 as "works, do not touch"2019-05-08 09:26:03
 leaving the parts to be modified, interpreted2019-05-08 09:26:13
 and as for interactive mode, since words are usually doing small things, it's useful to be able to test them by hand with different inputs2019-05-08 09:27:13
 the smaller the words are and the more words there are, the simpler it gets to trace a bug2019-05-08 09:27:51
<Shelwien> dunno, my C++ functions are usually also pretty simple2019-05-08 09:28:07
 and I never use interactive debug, except for compiler bugs2019-05-08 09:28:24
 just add logging or test scripts in the code2019-05-08 09:28:55
<unic0rn> well, the biggest word i've got in compression code, is 20 lines long i think2019-05-08 09:29:27
 most of those are actually single words2019-05-08 09:29:35
 of course, i also have a few onliners that are much more of a mess2019-05-08 09:29:53
 but still, those are oneliners. short moment to write and debug, leave around as a black box that does its thing2019-05-08 09:30:17
<Shelwien> i use perl for these2019-05-08 09:30:42
<unic0rn> i think that's the idea with forth, when it comes to the more lowlevel code, being cut down into very small pieces2019-05-08 09:30:45
 it's fast to write them and it's faster to rewrite them than to debug them2019-05-08 09:31:00
<Shelwien> its got regexp, and that's what is usually need2019-05-08 09:32:21
 but i don't really see where forth would be the best tool2019-05-08 09:32:52
 forth is easier to parse, but its not especially readable2019-05-08 09:33:30
<unic0rn> it can be readable, but it requires some time getting used to2019-05-08 09:33:50
 then it's more readable than everything else i saw2019-05-08 09:34:01
 but getting to writing such code takes time and determination. i've tried forth before. i even swore to never touch it again, because of the mess i've created with it2019-05-08 09:35:09
 now i'm doing much better, still not optimal i think though2019-05-08 09:35:34
<Shelwien> for example: void put4( uint c, FILE* g ) { putc(c,g); putc(c>>8,g); putc(c>>16,g); putc(c>>24,g); }2019-05-08 09:35:53
 how would that look in forth?2019-05-08 09:36:01
<unic0rn> : put4 ( c file -- ) here 4 allot rot over !2019-05-08 09:43:42
  4 swap write ;2019-05-08 09:43:43
 most likely like this2019-05-08 09:43:49
 ah, forgot to deallocate2019-05-08 09:43:58
 add -4 allot at the end2019-05-08 09:44:10
 write is reva forth, ans forth has write-file, same thing only returns ioerror on the stack2019-05-08 09:45:00
 reva has ioerr variable if one wants to check it2019-05-08 09:45:11
<Shelwien> uh, no, your version would be 100x slower :)2019-05-08 09:45:26
<unic0rn> both take address, byte count and fileid2019-05-08 09:45:30
 not sure2019-05-08 09:45:54
<Shelwien> putc writes to memory buffer, its not a direct write call2019-05-08 09:46:29
 also its written without a loop because of inlining2019-05-08 09:47:10
 well, loop would be unrolled and inlined anyway, but loop syntax won't be shorter, but would be harder to read2019-05-08 09:48:20
<unic0rn> well, that's comparing apples to oranges. reva isn't the fastest one for sure. and neither reva afaik, nor ans forth, have such buffered calls. that being said, it's a single write in that code2019-05-08 09:48:42
 unless you wanna call it in a loop2019-05-08 09:48:56
 but noone sane will call unbuffered write in a loop2019-05-08 09:49:09
 so yeah, direct translation without knowing the context, is a bad idea here2019-05-08 09:49:34
<Shelwien> well, i thought you'd use putc as another word2019-05-08 09:49:35
<unic0rn> could, went for lazy solution, wasn't sure what you're after2019-05-08 09:50:03
<Shelwien> so expected something like: c 8 shr g putc2019-05-08 09:50:30
 which is hardly more readable than C :)2019-05-08 09:50:56
 but i guess what you posted is an even better example in that sense :)2019-05-08 09:51:14
<unic0rn> it does what it needs within a single word. it isn't optimal2019-05-08 09:51:44
 normally i would throw away all the allots and if it's to be used in a loop, redefined puts entirely, to write c into preallocated buffer and increase a pointer variable2019-05-08 09:52:49
 wether it should check for the buffer size and flush automatically, or just expect manual flush, matter of taste, easy enough to do both2019-05-08 09:53:30
<Shelwien> well, that was presumed as things putc already does2019-05-08 09:53:58
<unic0rn> and "save this int to a buffer, increase pointer" is just bufptr @ ! bufptr 4 +!2019-05-08 09:54:41
<Shelwien> scary :)2019-05-08 09:54:55
 its like this: http://nishi.dreamhosters.com/u/getcputc.inc2019-05-08 09:55:00
<unic0rn> yeah, basic oneliners in forth look like that. basic ones. once you've got abstracted all the lowlevel stuff, it's much better2019-05-08 09:55:29
 yeah, kinda. but i won't bother to rewrite that :P2019-05-08 09:56:05
 instead, here's for curiosity and some laughs, reva forth output2019-05-08 09:56:25
 https://pastebin.com/raw/dfK6esZz2019-05-08 09:56:29
<Shelwien> uh, does it have an implicit loop there somewhere?2019-05-08 09:57:57
 i presume "here" takes the ip?2019-05-08 09:58:06
<unic0rn> no, here returns a pointer to the end of the dictionary2019-05-08 09:58:30
 new definitions go there, but you can also allocate temporary stuff there as needed2019-05-08 09:59:04
 one thing allot does is change the address returned by here2019-05-08 09:59:34
 not like words in general should use here like that, i just didn't want to introduce a variable2019-05-08 10:00:20
<Shelwien> ok2019-05-08 10:00:21
 well, as expected, it would be hard to optimize for llvm2019-05-08 10:00:35
 it could inline calls, but won't be able to reduce memory accesses2019-05-08 10:01:06
<unic0rn> but it would be. rot, over, stuff like that2019-05-08 10:01:30
 it's all stack mangling2019-05-08 10:01:34
 and those words do their own thing, separate from the others2019-05-08 10:01:46
 when you take what they do combined, it can be optimized2019-05-08 10:01:59
 especially when those as basic definitions get inlined2019-05-08 10:02:09
 also, i could just use variable blahblah outside the word definition, replace here with blahblah and remove both allots2019-05-08 10:02:53
<Shelwien> well, compilers have a problem with memory access2019-05-08 10:03:48
 first, if something is supposed to be stored in memory, they'd make sure it is stored there2019-05-08 10:04:27
 i mean, even if it turned out useless in the end of optimizations2019-05-08 10:04:47
 the idea is that it can be referenced elsewhere, and then optimizing it away would break the program2019-05-08 10:05:37
<unic0rn> that's a valid problem i guess, good you brought this up2019-05-08 10:06:10
 but there's also a solution2019-05-08 10:06:16
<Shelwien> so given code like you posted, I'd expect llvm to inline the calls at best2019-05-08 10:06:25
 all the memory accesses would remain the same, in same order2019-05-08 10:06:39
<unic0rn> llvm IR has like, infinite registers?2019-05-08 10:06:44
<Shelwien> https://en.wikipedia.org/wiki/Restrict2019-05-08 10:06:59
 yes, but i don't think you could put stack into registers2019-05-08 10:07:13
<unic0rn> you could. compiler would just need to keep track of the stack balance. of course, that means unbalanced loops go out of the window, but those shouldn't be written anyway2019-05-08 10:08:22
 then compiler would alias registers to stack cells during compilation2019-05-08 10:09:00
 llvm would do the rest2019-05-08 10:09:03
 not sure if that's the only solution in case of llvm, not sure how something like restrict would work here, will have to check2019-05-08 10:09:35
 if i can tell llvm "you can do it", that's even better. but if not, it can be done with registers2019-05-08 10:10:06
 it would be tricky though. not sure if llvm makes it possible to limit the scope of registers2019-05-08 10:11:18
 basically, you would want either to pass parameters via some "parameter registers", and ignore the rest, because each word can be called from different places with different stack balance2019-05-08 10:12:01
 which changes how registers translate to the stack2019-05-08 10:12:10
 compiling them all separately and then linking is also possible, but that throws away inlining2019-05-08 10:12:39
 but compiler can do inlining anyway, so llvm doesn't have to bother with that2019-05-08 10:13:00
<Shelwien> well, i looked it up2019-05-08 10:13:27
 what about PICK and ROLL?2019-05-08 10:13:35
<unic0rn> unfamiliar. searched the IR docs, didn't see them. i saw stacker - a sample implementation of stack using such words, built on top of llvm2019-05-08 10:18:03
 but i think using memory (or registers) is the only way to make sure reduntant things get optimized out2019-05-08 10:18:24
 also, llvm has noalias2019-05-08 10:18:29
 that should help with an array2019-05-08 10:18:34
 so memory access should be fine2019-05-08 10:19:24
 unless you meant forth words2019-05-08 10:21:09
 and registers2019-05-08 10:21:25
<Shelwien> yeah, forth words which would be hard to implement on registers2019-05-08 10:21:42
<unic0rn> well, not hard. impossible, unless limited to immediate values2019-05-08 10:22:13
 which is usually how they're used i guess2019-05-08 10:22:36
<Shelwien> i'm pretty sure there were more of these, especially some internal words used to implement loops and word definition2019-05-08 10:23:11
<unic0rn> a word should be interested in top 3, perhaps 4 at maximum, elements on the stack, so it's not like someone will use pick to treat the stack like a 1mb array2019-05-08 10:23:54
 there are very few basic words needed2019-05-08 10:24:47
 even less operate on the stack at all, that is, do something new with it, instead of using basic operations on top 3 or so elements2019-05-08 10:25:27
<Shelwien> btw, C also has static memory, dynamic allocation is slow and its better if there's none2019-05-08 10:25:34
<unic0rn> well, you don't allocate the stack dynamically2019-05-08 10:26:09
 that is, you allocate it once and that's it2019-05-08 10:26:27
<Shelwien> well, in C I can precompute a log2(int) table and put it into a static array2019-05-08 10:27:01
 what about forth?2019-05-08 10:27:10
<unic0rn> i guess it depends on the implementation. haven't seen that in ans forth2019-05-08 10:27:46
<FunkyBob> hrm... mornign thought... given a match, my code can easily implement a "find next match"... neat :)2019-05-08 10:28:03
<Shelwien> also, its pretty easy to break stack balance2019-05-08 10:28:03
 like push a different number of vars onto stack in different branches2019-05-08 10:28:27
 so i think it would be hard to keep all stack in registers2019-05-08 10:29:50
 and while using registers for top 4 or so values may be more practical2019-05-08 10:30:30
<unic0rn> branches shouldn't be unbalanced, imho2019-05-08 10:30:42
 at least they're not in my code2019-05-08 10:30:58
 it would create a mess2019-05-08 10:31:20
<Shelwien> you'd have to still move them around, rather than moving stack pointer2019-05-08 10:31:35
<unic0rn> as for arrays, you can always use create and allot2019-05-08 10:31:51
 to use the dictionary memory2019-05-08 10:32:02
 that's obviously limited, but there's no problem in telling the compiler how much it should allocate for that2019-05-08 10:32:33
 move them around? not really2019-05-08 10:33:31
 when the compiler keeps track of the stack balance, it knows exactly where each element is2019-05-08 10:33:44
 as i think about it, even inlining words wouldn't be a problem. it would have precalculated register indexes for each word already compiled2019-05-08 10:34:08
 then, when inlining them, it would use those precalculated indexes as base indexes to modify2019-05-08 10:34:26
<Shelwien> for inlined code, sure2019-05-08 10:34:44
<unic0rn> so obviously, depending on how deeply some code would be nested, it would be multipass compilation2019-05-08 10:34:55
 but pretty simple2019-05-08 10:34:58
 well, my first idea for llvm was just to inline everything2019-05-08 10:35:38
 and let llvm do the rest2019-05-08 10:35:49
<Shelwien> its not a bad idea, plenty of small C/C++ programs get fully inlined anyway2019-05-08 10:36:20
<unic0rn> it's not like translating forth words to llvm IR functions would be optimal anyway2019-05-08 10:36:33
 those were designed with C in mind2019-05-08 10:36:38
<Shelwien> but its necessary to make it possible to still have uninlined words2019-05-08 10:37:06
 because inlining only works as expected up to 32k of code2019-05-08 10:38:46
<unic0rn> i guess bigger problems arise from the dynamic nature of forth than that2019-05-08 10:39:54
 words can create other words on the fly2019-05-08 10:40:14
 there's create and does> after all2019-05-08 10:40:25
 although it should be possible to workaround it2019-05-08 10:41:24
<Shelwien> well, if you can make interactive forth with llvm, you can deal with these too2019-05-08 10:41:40
<unic0rn> sure. in interactive mode.2019-05-08 10:41:55
 thing is, how do you compile it2019-05-08 10:42:02
 when you compile a word that creates another word2019-05-08 10:42:13
<Shelwien> with llvm dll?2019-05-08 10:42:13
<unic0rn> yeah, and inlining goes out of the window, so does the overall simplicity of the compiler design2019-05-08 10:42:35
 but there's a reason create for example, takes as a parameter a string read from the source code2019-05-08 10:43:02
<Shelwien> well, 100% inlining requirement is too much of a restriction anyway2019-05-08 10:43:03
<unic0rn> not from the stack2019-05-08 10:43:06
 so when compiling such words, they could be "preexecuted"2019-05-08 10:44:11
 as a matter of fact, that's what immediate does2019-05-08 10:44:19
 as for inlining, in case of registers it's possible to use some mangling/unmangling code for those corner cases that shouldn't be inlined2019-05-08 10:45:21
 compiler then could return the stack-on-registers to its default indexing before calling such word2019-05-08 10:46:04
 llvm would optimize it out anyway2019-05-08 10:46:24
 so it's actually not a problem performance-wise to avoid inlining2019-05-08 10:46:57
 since all such stack mangling operations would be optimized by llvm2019-05-08 10:47:07
<Shelwien> anyway, i still think that aside from plain math, any forth code with complex logic (branches, loops etc) won't be properly optimized2019-05-08 10:47:21
<unic0rn> it heavily depends on how such code is defined on the lowest level2019-05-08 10:47:52
 different forth implementations vary2019-05-08 10:47:58
 for example, some define create as immediate, some don't2019-05-08 10:48:19
 loops can be done in many ways2019-05-08 10:48:41
 at the very bottom, core words would just have llvm IR inlined, obviously2019-05-08 10:49:00
 most likely in some macro form2019-05-08 10:49:20
<Shelwien> yes, but loops would still have to work with stack, or its not forth2019-05-08 10:49:38
<unic0rn> not sure what you mean2019-05-08 10:50:11
<Shelwien> limit index DO ... LOOP2019-05-08 10:51:43
 loop control vars are on stack (also control stack likely)2019-05-08 10:52:10
<unic0rn> and? it's implementation dependent. some forths use very basic words, like 0branch i think, reva for example doesn't have that. i've just looked, it calls (while) which can't be decompiled, it's just a word that does something, hell knows what without checking the code, do and loop do several more things, won't bother analyzing. as i've said, it's all implementation dependend. reva's is most likely 2019-05-08 10:53:09
 not elegant at all2019-05-08 10:53:09
 loop control vars are stored "somewhere"2019-05-08 10:53:37
 doesn't matter where, forth programmer can't access them directly2019-05-08 10:53:51
 can read them via i, j2019-05-08 10:53:55
<Shelwien> you can have nested loops2019-05-08 10:54:03
 so you'd need control stack2019-05-08 10:54:12
<unic0rn> yeah, usually limited to i and j2019-05-08 10:54:49
 2 elements2019-05-08 10:55:01
 you hardly need a whole stack for that2019-05-08 10:55:38
<Shelwien> 10 nested loops?2019-05-08 10:55:43
<unic0rn> haven't tried that.2019-05-08 10:55:57
 but i guess you can use return stack for that2019-05-08 10:56:46
 as a matter of fact, not sure if reva doesn't2019-05-08 10:56:55
<Shelwien> most likely it does2019-05-08 10:57:04
<unic0rn> return stack is mandatory anyway2019-05-08 10:57:17
 and actually, floating point stack as well2019-05-08 10:57:29
<Shelwien> but if they're different stacks2019-05-08 10:57:40
<unic0rn> so yeah, when going with registers, 3 stacks would be needed2019-05-08 10:57:48
<Shelwien> you should be able to push numbers to stack in one loop2019-05-08 10:57:57
 then drop them in another one2019-05-08 10:58:03
 go map that to registers :)2019-05-08 10:58:10
<unic0rn> well, keeping main stack balanced is an acceptable requirement. but pushing stuff to floating point stack could be a problem. same with return stack. in general, the fact that stacks can be out of sync2019-05-08 10:59:23
 the only way would be to have all stacks balanced at all times2019-05-08 11:00:30
<Shelwien> my point is, you most likely can make a forth compiler for your compressor2019-05-08 11:00:37
 but as a forth compiler in general it would be hard to keep it efficient, while supporting all the features2019-05-08 11:01:11
<unic0rn> first i need to finish the compression2019-05-08 11:01:45
<Shelwien> ok :)2019-05-08 11:01:52
<unic0rn> but then, as i've said, llvm ir has noalias2019-05-08 11:01:55
 keeping the stack in memory shouldn't be a problem for optimizations2019-05-08 11:02:05
 in such case, registers aren't a problem2019-05-08 11:02:20
<Shelwien> it would anyway2019-05-08 11:02:32
<unic0rn> why?2019-05-08 11:02:57
<Shelwien> noalias just lets it be sure that some writing to memory via another pointer2019-05-08 11:03:08
 won't affect this memory, so you don't have to re-read vars from it2019-05-08 11:03:28
<unic0rn> depending on how llvm's optimization works2019-05-08 11:03:41
<Shelwien> but it still would try to keep values in memory2019-05-08 11:04:08
 unless it can optimize away the whole array2019-05-08 11:04:18
<unic0rn> my point is, it'll see that this is the only code writing to that memory, and that reading from it is also there, so when both operations are close to each other, unless some other piece of code reads from them without writing there first, yeah.2019-05-08 11:05:08
 it should be able to optimize it out2019-05-08 11:05:13
 and it has crazy number of optimizations passes2019-05-08 11:05:35
<Shelwien> tested it: https://godbolt.org/z/R6ld0z2019-05-08 11:05:38
<unic0rn> some recursive i think2019-05-08 11:05:40
<Shelwien> see its, a totally useless array2019-05-08 11:06:27
 but once it can't drop it, it has to write totally useless values to it also2019-05-08 11:06:48
 in that code, if we modify it to array[0] = 0;//_len;2019-05-08 11:07:42
 it would drop the whole array and won't write to it2019-05-08 11:07:58
 but that only happens when it understands the whole thing2019-05-08 11:08:16
 and of course its not just because i used a volatile var there2019-05-08 11:08:38
 its just an easy method to make a var, value of which compiler doesn't know2019-05-08 11:09:02
<unic0rn> thing is, that array isn't useless.2019-05-08 11:17:30
<Shelwien> another version: https://godbolt.org/z/nZQdSX2019-05-08 11:18:19
 but it is2019-05-08 11:18:30
 it doesn't affect anything at all2019-05-08 11:18:48
<unic0rn> true, my mistake.2019-05-08 11:21:28
 i need another coffee i guess.2019-05-08 11:21:34
 that being said2019-05-08 11:21:36
<Shelwien> yet another: https://godbolt.org/z/JmC56u2019-05-08 11:21:45
 it previous clang was able to write 2 to array right away, now it can't2019-05-08 11:22:07
<unic0rn> i wonder what would happen with dynamically allocated array2019-05-08 11:22:32
 passed to a function with __restrict__2019-05-08 11:22:45
<Shelwien> that'd be even worse2019-05-08 11:22:45
 it won't be able to optimize it away2019-05-08 11:22:57
 because alloc is an operation with side effects2019-05-08 11:23:13
<unic0rn> oh, it doesn't have to remove the array.2019-05-08 11:23:44
 it just shouldn't bother accessing it.2019-05-08 11:23:54
<Shelwien> and if it doesn't remove it, it would have to keep it up to date2019-05-08 11:24:07
<unic0rn> what for?2019-05-08 11:24:23
 if __restrict__ makes it realize that nothing else reads it?2019-05-08 11:24:37
<Shelwien> restrict only tells compiler that a write to this specific pointer doesn't affect any other memory2019-05-08 11:25:13
 and as to why they keep useless arrays2019-05-08 11:25:39
 its probably just for simplification of array tracking2019-05-08 11:25:58
 otherwise they'd likely have to allocate a var descriptor per array element2019-05-08 11:26:42
 and C++ compilers are already pretty slow as is2019-05-08 11:26:55
 so i guess they decided to track them on array level2019-05-08 11:27:06
 but yeah, its annoying as hell2019-05-08 11:27:24
<unic0rn> doesn't affect any other memory? isn't it the other way around, to guarantee that nothing else accesses the memory being accessed by said pointer?2019-05-08 11:27:38
<Shelwien> for example, making a macro with a list of vars2019-05-08 11:27:49
 and passing the whole list around in function arguments2019-05-08 11:28:00
<unic0rn> so basically only the pointer is being used to access the array and nothing else2019-05-08 11:28:04
<Shelwien> can be much faster than using class methods or passing struct reference to functions2019-05-08 11:28:27
 because individual fields are allocated to registers2019-05-08 11:28:42
 while structs are not2019-05-08 11:28:51
 ---2019-05-08 11:29:37
 see https://en.wikipedia.org/wiki/Restrict again2019-05-08 11:29:41
 restrict keyword is used to determine if write with one pointer affects reads with all other pointers2019-05-08 11:30:23
<unic0rn> https://godbolt.org/z/d7UpSz2019-05-08 11:35:36
 "It says that for the lifetime of the pointer, only the pointer itself or a value directly derived from it (such as pointer + 1) will be used to access the object to which it points."2019-05-08 11:36:39
 that is, nothing else with access that memory2019-05-08 11:36:53
 which is why it can optimize it out2019-05-08 11:37:04
 but compiling C is a mess, so i guess outside of __restrict__ it can't be sure about anything2019-05-08 11:37:46
 with malloc in main, it won't work2019-05-08 11:37:54
<Shelwien> i looks like gcc and clang simply have special support for malloc2019-05-08 11:38:29
 you can see that icc didn't optimize it away2019-05-08 11:38:37
 and if instead of malloc, some wrapper is used...2019-05-08 11:39:06
<unic0rn> malloc just returns a pointer2019-05-08 11:39:27
 or differently2019-05-08 11:39:39
 that pointer is initialized within the function that says that only that pointer will be used to access that memory2019-05-08 11:39:56
 so no matter what initialized that pointer, as long as it's within that function with __restrict__, it should work2019-05-08 11:40:12
<Shelwien> https://godbolt.org/z/--Iq0x2019-05-08 11:40:37
<unic0rn> doesn't matter if malloc spawns another thread reading from that memory over and over2019-05-08 11:40:41
 __restrict__ says it won't2019-05-08 11:40:46
<Shelwien> here gcc still worked, clang didn't2019-05-08 11:40:47
<unic0rn> yeah, noinline2019-05-08 11:41:49
<Shelwien> in your version, gcc and clang just managed to drop the whole array, because they know how malloc works (explicit support)2019-05-08 11:41:54
 well, compare gcc and clang code there2019-05-08 11:42:02
<unic0rn> by definition, __restrict__ works within a function2019-05-08 11:42:07
 if the pointer gets initialized elsewhere, well2019-05-08 11:42:15
<Shelwien> but value returned from xmalloc is assigned to restrict pointer2019-05-08 11:42:26
 so its ok2019-05-08 11:42:27
<unic0rn> well, it is stupid.2019-05-08 11:42:58
 but i guess if xmalloc would be in a lib, then it could work2019-05-08 11:43:07
<Shelwien> sure2019-05-08 11:43:17
<unic0rn> still, in general it's possible.2019-05-08 11:43:18
<Shelwien> as i said, it seems they track arrays and structures per instance2019-05-08 11:43:30
<unic0rn> so llvm should be able to optimize out whole stack2019-05-08 11:43:32
 as long as noalias is used2019-05-08 11:43:43
<Shelwien> so when they can understand the whole state of whole state of array/structure in a block2019-05-08 11:44:05
 they can discard it2019-05-08 11:44:15
 but if there's even one unknown value written to it2019-05-08 11:44:41
 they have to maintain the whole array up to date2019-05-08 11:44:57
 and yeah, it won't be able to optimize away whole stack2019-05-08 11:45:40
 it only can happen when it understands its whole state2019-05-08 11:45:54
<unic0rn> it shouldn't have a problem with that2019-05-08 11:46:08
<Shelwien> so only for simply code that can be precalculated at compile time2019-05-08 11:46:13
<unic0rn> since everything, including core words, will be in a single llvm ir file2019-05-08 11:46:27
 that is, recompiling the app will be recompiling whole implementation2019-05-08 11:46:41
<Shelwien> yes, but they don't analyse that deeply2019-05-08 11:46:43
<unic0rn> as i've said, llvm has multiple passes, some recursive2019-05-08 11:46:58
 it may do it2019-05-08 11:47:03
 and even if not2019-05-08 11:47:05
 optimizing away redundant operations between words is enough2019-05-08 11:47:21
 and gives a huge boost2019-05-08 11:47:29
 also, "that deeply"2019-05-08 11:48:07
 when most of the things will be inlined, that may just do the trick2019-05-08 11:48:19
 also, consider that how forth works, depends on the implementation2019-05-08 11:49:23
<Shelwien> here: https://godbolt.org/z/zPC52x2019-05-08 11:49:33
<unic0rn> when inlining bigger (non-core) words, i don't have to specifically inline them2019-05-08 11:49:38
 it's enough to keep them in the same function for llvm to analyze it all together2019-05-08 11:49:55
 that is, assuming llvm ir has a jump that takes a parameter2019-05-08 11:50:19
 so i can do jumps between locations of a function using return stack2019-05-08 11:50:36
<Shelwien> well, from these tests, it seems that gcc does much better2019-05-08 11:50:47
 although it looks like malloc has better handling than c++ new somehow2019-05-08 11:51:23
 but still, it only works when it understands the whole array state at the end of the function2019-05-08 11:52:00
<unic0rn> what you're doing is like probing compiler's eye with a stick and checking the response ;)2019-05-08 11:52:42
<Shelwien> no, i'm just too lazy to write complex code that it won't be able to comprehend2019-05-08 11:53:09
<unic0rn> i mean, (array - ((uint*)0)) % 3; - what sane person does that :P2019-05-08 11:53:15
<Shelwien> but such code is normal in any real project2019-05-08 11:53:25
<unic0rn> yeah, i know2019-05-08 11:53:30
<Shelwien> ok, let me write something different there2019-05-08 11:53:33
<unic0rn> it just looks funny2019-05-08 11:53:34
<Shelwien> ok, here: https://godbolt.org/z/6A5Z3v2019-05-08 11:58:07
 that loop is enough to make compiler stop trying to understand what happens2019-05-08 11:58:49
 and then it keeps the array and values in it2019-05-08 11:59:01
 even though they're not necessary and output is a constant2019-05-08 11:59:15
 as to ((uint*)0)2019-05-08 12:02:01
 a "sane person" probably would write something like ptrdiff_t(array) instead2019-05-08 12:02:36
 but *_t types are not fully portable2019-05-08 12:03:03
 while (array - ((uint*)0)) has the same type without naming it2019-05-08 12:03:37
<unic0rn> i broke it2019-05-08 12:05:33
 https://godbolt.org/z/X5nw7L2019-05-08 12:05:35
<Shelwien> well, bound checks don't exist in C/ะก++2019-05-08 12:07:18
 as to long series of ADD in clang, its simply an optimization (loop unroll)2019-05-08 12:08:18
 you can stop it by adding #pragma unroll(1) before loop2019-05-08 12:08:26
<unic0rn> yeah, i know2019-05-08 12:08:34
 it's just "so you're trying to avoid memory access... well look at THIS"2019-05-08 12:08:52
 made me chuckle2019-05-08 12:09:03
 as for the loop in general, no idea what's happening2019-05-08 12:10:05
<Shelwien> its simply there to stop compiler tracking2019-05-08 12:10:40
 compilers can optimize reasonably small expressions2019-05-08 12:11:00
<unic0rn> it's violating __restrict__, basically2019-05-08 12:11:01
<Shelwien> you mean out-of-bound access?2019-05-08 12:11:25
<unic0rn> yeah2019-05-08 12:11:41
<Shelwien> "If the declaration of intent is not followed and the object is accessed by an independent pointer, this will result in undefined behavior."2019-05-08 12:11:51
<unic0rn> basically, it has to be able to track the offset2019-05-08 12:12:25
 to optimize it out2019-05-08 12:12:30
<Shelwien> it has to be able to track the whole array state2019-05-08 12:12:50
<unic0rn> with forth, that shouldn't be a problem2019-05-08 12:12:54
 yeah2019-05-08 12:12:55
<Shelwien> well, it would be a problem once you have any loops or branches2019-05-08 12:13:20
<unic0rn> it doesn't have to optimize out return stack2019-05-08 12:13:46
 that's separate2019-05-08 12:13:55
<Shelwien> that's unrelated2019-05-08 12:13:57
<unic0rn> as long as the loop is balanced, it shouldn't have a problem2019-05-08 12:14:34
<Shelwien> if control structure if sufficiently complicated, the compiler would be able to understand what it does2019-05-08 12:14:37
 and once it can't, it would keep the whole array up to date2019-05-08 12:15:01
 *won't be able2019-05-08 12:15:22
 well, for programmers there's now a workaround - via constexpr2019-05-08 12:16:26
<unic0rn> well, it's a matter of keeping the stack operations predictable2019-05-08 12:16:28
<Shelwien> nope, that's unrelated to stack tracking2019-05-08 12:16:52
<unic0rn> that is, my guess is the only option would be to predefine some words for common operations2019-05-08 12:16:53
 yes, it is.2019-05-08 12:17:00
 because once you've got only direct offsets into memory array, it should optimize it out2019-05-08 12:17:22
 things get problematic with stuff like pick2019-05-08 12:17:34
 because it doesn't know which element you'll want to read2019-05-08 12:17:48
<Shelwien> as i said (and demonstrated), it doesn't optimize away an array where it doesn't know even one element value2019-05-08 12:18:07
<unic0rn> but that can be avoided, by defining words doing the same thing with few predefined offsets and scrapping the general case2019-05-08 12:18:22
 not exactly following ans specification, but that may be the price2019-05-08 12:18:36
 that's my point2019-05-08 12:18:56
<Shelwien> and when an element is set to a value which has sufficiently complex dependence chain2019-05-08 12:18:59
 that's enough to stop it from optimizing away the array2019-05-08 12:19:13
<unic0rn> exactly. which is why each offset should be an immediate value2019-05-08 12:19:27
<Shelwien> simple branches would be okay2019-05-08 12:19:29
 but a variable-size loop won't2019-05-08 12:19:38
 same with some arithmetic operations like %2019-05-08 12:20:01
<unic0rn> unbalanced loops are out of the question, that much is obvious2019-05-08 12:20:03
<Shelwien> not unbalanced, just variable-length2019-05-08 12:20:19
 or fixed-length, but long enough2019-05-08 12:20:26
 for example, a loop like for( i=0,s=0; i<100; i++ ) s+=i;2019-05-08 12:20:46
 modern compilers can turn into constant2019-05-08 12:20:55
 because they have it in their pattern library2019-05-08 12:21:07
 but anything non-trivial won't be tracked2019-05-08 12:21:31
<unic0rn> assuming s is top of the stack, that's just adding i to the top of the stack2019-05-08 12:21:52
<Shelwien> yes, and that one compilers would understand2019-05-08 12:22:32
<unic0rn> what i'm saying is, unless something is unbalanced, it's entirely predictable - and obvious - which parts are modified2019-05-08 12:22:42
<Shelwien> but how about this: for( i=0,s=0; s%100==0; i++ ) s+=i; ?2019-05-08 12:22:48
 its not unbalanced, right?2019-05-08 12:22:58
 well, in any case2019-05-08 12:23:30
<unic0rn> it's still just accessing top of the stack2019-05-08 12:23:32
<Shelwien> yes, but that's enough to trip the compiler2019-05-08 12:23:45
<unic0rn> maybe in C2019-05-08 12:23:53
<Shelwien> i mean by that that compiler won't be able to understand that the result of that is constant2019-05-08 12:24:23
<unic0rn> doesn't have to. it can execute the loop, that's not what i'm after2019-05-08 12:24:47
 but it will optimize out the stack2019-05-08 12:25:03
 s will be a register2019-05-08 12:25:10
 i most likely too, because it shouldn't have trouble tracking the return stack, unless it becomes unbalanced somehow, and that shouldn't happen2019-05-08 12:25:51
<Shelwien> ah, that won't be a problem, sure2019-05-08 12:26:20
<unic0rn> precisely2019-05-08 12:26:25
 and that's where the biggest performance gain is, imho2019-05-08 12:26:35
<Shelwien> but it won't optimize away the array2019-05-08 12:26:37
 and then it would have to store element values to the array2019-05-08 12:27:01
<unic0rn> well, if it will optimize out the stack, then effectively it will optimize out its array2019-05-08 12:27:20
<Shelwien> so tight loops, yeah2019-05-08 12:27:20
 but any complex code would still work with stack array all the time2019-05-08 12:27:38
<unic0rn> no reason.2019-05-08 12:27:58
<Shelwien> also as I already mentioned, you can't inline everything2019-05-08 12:28:08
<unic0rn> even complex code shouldn't keep much on the stack2019-05-08 12:28:20
 or rather, shouldn't keep TOO MUCH2019-05-08 12:28:26
<Shelwien> at least the internal loop shouldn't be more than 32k of native code2019-05-08 12:28:35
 because that's the size of L1 code cache2019-05-08 12:28:46
 and once you have more than that, processing becomes 10x slower2019-05-08 12:29:07
<unic0rn> i'll see later when i'll be coding it, how things look with jumping around inside a single function2019-05-08 12:29:22
 in llvm ir2019-05-08 12:29:26
<Shelwien> sure :)2019-05-08 12:29:33
<unic0rn> because as long as it's a single function, that would be perfect2019-05-08 12:29:37
 doesn't mean i have to copy each word 10 times around2019-05-08 12:29:44
 as for stack, even complex code accesses top 2-3 elements, sometimes 42019-05-08 12:30:38
 the rest is in allocated memory and global variables2019-05-08 12:30:51
 there's really no need for very deep stack, as long as one can keep himself from abusing the stack, which happened to me in the beginning and i was easily loosing track of it2019-05-08 12:31:46
 factoring helps2019-05-08 12:32:06
 and factoring obviously doesn't make the stack more shallow, but then, how deeply can the code go2019-05-08 12:32:23
<Shelwien> well, i have 4k stack for coroutines, usually is enough2019-05-08 12:32:44
<unic0rn> i doubt my compression reaches 10 elements on the stack at any time2019-05-08 12:33:15
 hell, i wouldn't be surprised if it was more around 52019-05-08 12:33:28
<Shelwien> well, it only really accumulates with nesting depth2019-05-08 12:34:15
<unic0rn> yeah2019-05-08 12:36:24
 but then, how much nesting can you do, in a single file2019-05-08 12:37:04
<Shelwien> i still don't understand the benefits of forth, though :)2019-05-08 12:37:11
<unic0rn> to be sure things get optimized properly though, i'll probably have to implement compiling separate files into separate objects2019-05-08 12:37:34
 and linking it all together later2019-05-08 12:37:39
<Shelwien> %)2019-05-08 12:37:45
<unic0rn> because throwing everything together may be risky2019-05-08 12:37:52
 as for benefits, how about keeping me from coding a groundbraking compression by talking about compilers ;)2019-05-08 12:39:37
 thanks for all the insight by the way, very useful2019-05-08 12:39:55
<Shelwien> ok :)2019-05-08 12:40:12
 groundbraking :)2019-05-08 12:40:32
<unic0rn> unfortunately, can't get back to coding right now, because of upcoming appointment, so i'll be back later2019-05-08 12:40:37
 LOL2019-05-08 12:40:39
 i didn't have time to grab another coffee and it shows :P2019-05-08 12:41:00
<Shelwien> well, there's https://handbrake.fr/2019-05-08 12:41:42
<unic0rn> yeah, i prefer ffmpeg. as for forth on llvm, i wonder how it'll do with generating gpu code. never played with programming those, i guess we'll see. but that has time, right now gotta go2019-05-08 12:46:47
<Shelwien> gpu code is basically the same as current x862019-05-08 12:47:56
 vectors, threads2019-05-08 12:47:59
 so nothing that different, i think2019-05-08 12:48:06
*** maniscalco_ has left the channel2019-05-08 13:01:33
 !next2019-05-08 14:40:43