LLM’s are not the end all be all. What other AI tech are you all using? Something generative? Something else?
It’s kinda fun to poke around with image and video generation using ComfyUI. There’s all sorts of models, LORAs, and setting to tune and play with
I want to but I don’t think I have the hardware to support it. Need at least a decent GPU, right?
For playing around, no. My GPU is 9 years old and has 8GB VRAM. I generating my 384x512 profile picture with KoboldCPP (Vulkan) and Z-Image-Turbo model locally.
I just regenerated my profile picture with KoboldCPP a bit differently. I used some clearer LoRA configurations than I did initially. For Z-Image-Turbo it’s CFG 4 and for Flux2 Klein 4b CFG 1, both used Euler. For Anime V1 LoRA I started with
anime illustrious,and for Koni Animestyle LoRA withanime_style,. :384×512 & Z-Image-Turbo & Anime V1 & 8 steps (2min 42s):

384×512 & Z-Image-Turbo × Anime V1 & 25 steps (8min 16s):

384x512 × Flux2 Klein 4b × Koni Animestyle × 4 steps (22s):

384x512 × Flux2 Klein 4b × Koni Animestyle × 8 steps (35s):

384x512 × Flux2 Klein 4b × Koni Animestyle × 25 steps (1min 28s):

Upscaling it with RealESRGAN x4plus anime 6B upscaler only takes 10 seconds longer with this dimension.
KoboldCPP outputs how long generation takes like:
[11:09:15] Generating Image (8 steps) |==================================================| 480/480 - 0.00MB/s |==================================================| 480/480 - 0.00MB/s |==================================================| 480/480 - 810.94MB/s |==================================================| 480/480 - 0.00MB/s |==================================================| 8/8 - 19.64s/it [11:11:57] Generating Media CompleteThose are pretty good results! Unfortunately, I don’t have a GPU at all yet. Any recommendations on something capable yet affordable? What would you buy next?
I stick to what I have as long as it continues to work for me, therefore I haven’t looked into GPUs the last years. New Radeon RX 580 is still at 200€, I paid 300€ 8 years ago. Looks like RX 580 2048SP 8GB is actually one of the cheapest one can get currently, but not entirely sure. (My GPU has 2304 shading units.) It’s so old that ROCm support was dropped and you really need software that can do AI via Vulkan
edit: looks like Intel Arc A750 is 5 years younger, has better performance and more memory bandwidth
Thanks! I’ll make sure to include Vulcan in criteria for a new machine and I’ll look at intel options. Only been looking at NVIDIA. Even 4GB VRAM will be a huge upgrade for me.
All modern GPU drivers support Vulkan, at least on Linux. I don’t know how bad Vulkan does in comparison to CUDA, since I’ve never used that stuff. The bigger issue is software-side support. Ollama, llama-cpp and KoboldCPP all support Vulkan by now. ComfyUI doesn’t seem to support it.
Thanks! I’ve been using llama.cpp but not married to it. What’s your opinion on used machines? Presumably risky to get a used GPU or they don’t really just fail?
Ah yeah, I think you need a 6GB gpu for images and probably like a 12Gb GPU for videos.
I think you can run with models without, but it just slows way down
Agents. And finding sane ways to run them safely.
Yeah big move towards agents. They’re based on LLM’s, no?
Yeah. Basically there like giving an LLM a set of tools it can use on its own. It’s a lot like having an intern at a company. You don’t give them access to anything you don’t want destroyed. But you can give them simple/safe buttons to push.
For sure. I was wondering what people are doing beyond LLM’s. Is there some next gen take that’s coming out now?
At one point I had a weird obsession in making neural networks train only on uint_8. I tried:
0 to 255 is all you get. You know, “Real Programmers scorn floating point arithmetic.”
You want 16 bits? Make it a 8 bits overflow counter.
We don’t need divisions or multiplications when we have bit shifts.
My end goal was language models (probably not “large”) but I barely got to make an acceptable MNIST after begrudgingly accepting that I should sully my 8 bits purity with negative numbers.
I still have that itch to scratch that I feel the process of gradient descent could be replaced by something better designed for the type of information we want to flow back (“move that thingie in that direction for the loss to go down”)
Basically I liked the idea of easy visualization and forcing myself to not use any sort of layer normalization (that I secretly suspect I never fully understood)
Wow most of this goes way over my head. I kind of understand the limiting to 8 bit method. I’ve heard about that being done elsewhere.
When you say negative numbers you’re talking about the embedding?
I still don’t understand gradient descent fully, but can you explain why you think it should be replaced and with what?
Thanks!
Ha ha no, I never went as far as needing embeddings for a language model. MNIST is actually, you know, the very simple classification model. It’s a bit the ‘Hello World’ of machine learning. It’s a dataset of handwritten digits that you have to classify in 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
It is a good test because you can train it in minutes if not seconds even on crappy hardware and unoptimized code.
So when I’m talking about negative numbers, I am talking about negative numbers. I am talking about weights that are needed to be negative to have negative influence on the output. Like “this pixel in the center is white, so the likelihood of the number being zero decreases.”
I still don’t understand gradient descent fully, but can you explain why you think it should be replaced and with what?
Honestly, I’m just talking about it because we are being silly, but I am not sure that’s an idea I actually want to defend.
I just have this feeling that gradient descent is a good mathematical construction for what we try to achieve, but that mathematical purity maybe, just maybe, gets in the way of efficient computing. Of course, there are thousands of very competent, highly paid people who already explored that venue, so I’m pretty sure that if something better was possible and within the reach of one person, it would already have been discovered.
(Counterpoint: we routinely rediscover things that were invented in the 90s that are now good ideas now that we have very good computing)
The thing is gradient descent is used to tell you in which direction you’re supposed to move a weight to lower the loss of your results. In other words, to minimize the error of your network.
Gradient or partial derivatives are like an ideal mathematical tool to do that. We are able to derive it for a lot of functions, linear or not, and it is a well-studied mathematical object, so it really makes sense to use that.
The direction of the gradient will tell you the direction in which the parameters need to move. More precisely, the partial derivative of a given parameter will tell you if you need to increase it or lower it in order for the loss to improve.
Thing is we use the sign that’s clear but the intensity I am not sure it is that relevant because we keep fighting against things like gradient vanishing problem where very deep networks tend to have very low gradients and we compensate a lot of its problem through optimizers, choices and tricks.
I wonder if there would not be a pure computer science way of just keeping track of the direction in which you want the parameter to change.
I don’t know, maybe triple all the calculations by one tick in both directions? or just use gradients on one bit when it makes sense? Or find a function that’s very fast to compute but that just approximate gradients and that is just better than randomness at finding the sign.
Like I said, that’s just an itch to scratch. That’s not a strong conviction that there is something. But if you were to give me two weeks salary to just work on that, I would be very happy to.
What about quantum computing? That’s seems to be the next step. Much more efficient. Again, I’m totally out of my depth there as well. I only understand qubits at a very basic level. I don’t even really understand wave functions.
deleted by creator
Thanks for sharing! Yeah interesting community. Some people don’t like sharing ideas.
I skimmed through the video but not sure what the use of Grace would be. I can already ask an LLM to generate markdown for me. How do you use it?
deleted by creator
Haha that AI generated summary is even less appealing than the video. I’m not a coder so what I’m saying is the value of Grace doesn’t translate to me. Is it that using Grace code you’re able to output prompts that LLMs are better at processing? I’m asking for examples of how you are using it. Thanks!
deleted by creator
Doxxing?
Thanks for the explanation. Maybe I don’t understand, but I thought by nature LLMs were not deterministic. Is it because you’re breaking the output up into pieces?
deleted by creator
Can you give me a use case?



