So Deepseek just quietly released an open-source beast-at-math model (details inside)

CriticalResist8@lemmygrad.ml · 2 hours ago

I agree. let’s start with him

CriticalResist8@lemmygrad.ml · 4 days ago

I was actually thinking of having a deepseek api key for lemmygrad that people could freely top up and use so that we can all share, but I’m not sure how feasible it is on the trust system. I’ll take a closer look and make a post if there’s something I can set up

CriticalResist8@lemmygrad.ml · 4 days ago

I finished a guide yesterday, in the process of adding some pictures to it: https://lemmygrad.ml/post/9911266

It’s really not that complicated to install crush! I promise haha. If she is on Windows it’s literally copy and paste a command in powershell (winget install charmbracelet.crush), close powershell, open cmd, cd to a folder, type ‘crush’ and send, and it will open.

Then in crush panel type Thinking mode to get deepseek, hit enter,

Then make a deepseek API key and top it up with 5$ on platform.deepseek.com, copy API key, paste with Ctrl+Shift+V, hit enter, and it will just work. Try to send hello to make sure you’re connected.

CriticalResist8@lemmygrad.ml · 5 days ago

setting up an API connection is definitely a bit more involved, but it allows people to use it for their specific needs that the API devs may not have thought of. For example to translate all of prolewiki english to french, I set up an API access to mistral to use their servers/models to do the actual translation. Basically I sent a chunk of text through the API, it did its magic on mistral’s servers side of things, and then their servers return the translated chunk of text. My script saves that returned text to a document, without ever caring what was going on mistral’s side.

There are other programs for agent coding though I don’t have experience with them, but they might make it a bit easier for people to adopt the tools. I know Claude (Anthropic) has one that apparently works with models other than their own, and is a bit more graphical, i.e. you can use it with the mouse while crush is keyboard-only.

But try out crush, it works on any computer and I promise it’s not as scary as it looks haha (if that’s what’s holding you back). Once you’re connected it works every time you’ll use it from then on.

CriticalResist8@lemmygrad.ml · 5 days ago

They have an API: https://platform.deepseek.com/usage; log in with your deepseek account and everything is on the left sidebar (including documentation and creating the key).

I use the API with crush which is an agent, which is made to solve tasks/code, but can also just chat like the web interface. I posted an install guide for it in the sidebar of [email protected] because it’s that easy to set up lol, literally just 7 steps and it should work. With crush you just need to give it your deepseek API key when you first start crush and it should work.

Deepseek’s API is also very cheap, I put 5$ in a couple months ago and I haven’t finished them yet, and I used literally hundreds of millions of tokens lol. On GPT or Claude this would have lasted me all of 10 minutes.

CriticalResist8@lemmygrad.ml · 5 days ago

Deepseek: completely free and with 128k context window (chatpgt has a measly 16k or something)

Z-image turbo: completely free and can run on any modern GPU (6GB Vram for ggufs)

Huanyuan: open-source txt2vid model, completely free too

I think I know who’s winning…

CriticalResist8@lemmygrad.ml · 6 days ago

I think this is still the “old” OCR version, they released a new one a few months ago (https://huggingface.co/deepseek-ai/DeepSeek-OCR) that can pretty much OCR anything. It can also compress it and use it as context, which could unlock unlimited context.

Mind you OCR is only about 6 gigs so you could totally use it on most modern gpus. Yogthos is going to try and make it able to parse a full pdf which will be very useful to upload books to prolewiki with (I used to have a good OCR program but OS change and now it doesn’t work anymore…)

Wish it could come to their web interface too though 🥲

CriticalResist8@lemmygrad.ml · 6 days ago

I don’t think they’ll put it on the web version, at least not yet. A few months ago they released OCR which is pretty amazing at turning pictures into text, but did not roll it out to their web interface either. Math-v2 is a completely different model and they’d have to load it on their servers so that’s probably why they don’t include it. They might have API access for it though eventually, paid of course.

The full model is 185 gigs which means as many gigs of Vram to run it, so I don’t think we’ll see it on consumer hardware anytime soon… however the model is released so labs and universities (aka centers that have that kind of Vram available) can run it right now, by downloading the safetensors files from huggingface and - so to answer that question, yes it exists and is available right now!

But with the way Chinese innovation is going in AI, I expect by the end of 2026-27 we will be able to run these kinds of models on our computers, without doing heavy quantization or distills that basically destroy the model’s capabilities. They are already finding new ways to quantize models without losing accuracy in China. If you think about it it’s only existed commercially for 3 years and we already have gguf optimizations so, it’s only progressing haha

CriticalResist8@lemmygrad.ml · 6 days ago

flux.dev is/was nice, but it was also the only one they allowed to be used locally. I guess their business model was to allow a small version for consumer use and hope it provides free marketing? I don’t really see the point of even running flux dev now since we have z-image, though it does seem to be a bit worse at text writing (but you will probably be able to use another checkpoint for that soon if I understand how that works).

CriticalResist8@lemmygrad.ml · 6 days ago

So Deepseek just quietly released an open-source beast-at-math model (details inside)

CriticalResist8@lemmygrad.ml · 6 days ago

We’ll make mixtapes like in the old days except instead of recording the radio we’ll be recording our speakers

CriticalResist8@lemmygrad.ml · edit-2 7 days ago

AI music generation startup Suno settles lawsuit to join with Warner group -- what this means exactly for IP (and I was right)

CriticalResist8@lemmygrad.ml · 8 days ago

Suno is already dead anyway… they made a deal with Warner and this happened:

Which I predicted in my essay about AI and IP but I would just be tooting my own horn (pardon the pun). More notably they spend a lot of time talking about the download options so you get distracted from the actual news, that they are retiring the “current” models (4.5 and 5, their current top of the line) to make way for licensed models, which means exactly what you think it means - I’ll make a post about it actually because this is kinda big.

CriticalResist8@lemmygrad.ml · edit-2 10 days ago

how do people still get taken in by that 😭 if the llm is not actively generating it’s not doing anything!

edit: either that or the LLM was thinking in circles lol, happened to me once

CriticalResist8@lemmygrad.ml · 14 days ago

Austrian economics.jpeg

CriticalResist8@lemmygrad.ml · 17 days ago

a joke?

CriticalResist8@lemmygrad.ml · 20 days ago

Researchers question Anthropic claim that AI-assisted attack was 90% autonomous

CriticalResist8@lemmygrad.ml · 22 days ago

And it’s only going to get faster…

(Note that this is the Q4 model, but otherwise still the full 671b params and thinking mode enabled)

CriticalResist8@lemmygrad.ml · 26 days ago

https://digitalspaceport.com/how-to-run-deepseek-r1-671b-fully-locally-on-2000-epyc-rig/ it’s even better, this rig doesn’t even have a GPU.

CriticalResist8@lemmygrad.ml · 26 days ago

it’s not any better than perplexity if you want to use it as a search engine in my experience. but to do tech stuff like debugging it works

CriticalResist8@lemmygrad.ml · edit-2 27 days ago

I understand where you’re coming from, but I think there might be some misconceptions about the resource requirements. You can actually host LLMs on a local computer without needing a $10,000 GPU. For example, it’s possible to self-host the full Deepseek model on a $2000 setup and open it to your organization for browser-based use, or smaller models on a 400$ GPU.

I also find it compelling that LLMs like Deepseek are designed to be very efficient in their cloud versions, especially when compared to Western tech that isn’t incentivized to prioritize environmental concerns because there are no mechanisms in place to force them to care about the environment. This (the fact that capitalism won’t save the environment) is a much stronger argument than a blanket “no datacenters,” since a datacenter is powering Lemmygrad as we speak. To put it in perspective, China has about 450 datacenters while the US has over 4000, yet their tech sector is just as advanced. It shows there are different, more efficient ways of doing things that we (the state) can tap into if we only wanted to.

This also seems like it could erode trust in Communist organizations

To be perfectly honest, I think you overestimate the existing level of trust the general masses have in communist organizations.

I’m coming from a place of wanting our movements to succeed globally, it’s just that it worries me when I see us hesitating to adopt tools that could give us a real edge. We already use technology, including the internet and automated stuff in our organizational work. I believe we need to move past a certain hesitation toward new tech (a sort of “return to Pravda” mindset) and embrace whatever makes our praxis more effective. We don’t have the luxury of refusing efficient tools. Looking at how China integrates technology provides a practical, existing blueprint for this.

I’ve often seen proposals to automate tasks or improve efficiency in orgs get shut down with responses like, “Oh, that sounds complicated,” or “I like the way we do things already.” But we have to try new things if we want to close the gap. I’d be happy to help build out a tech stack if given the chance! And yet many still prefer to rely on manual email lists when a simple Telegram channel could coordinate communication.

It’s a bit like how the MIA gained its foothold in the 90s while other communists were still debating whether the internet was a fad. We got shown up by trots!

Just recently, we launched a Telegram broadcast channel with ProleWiki to share news. It’s only the first week, and we already have 80 subscribers. That’s 80 people we can reach directly, without being subject to algorithmic filters. The bot for the channel was coded by our dev with some LLM assistance, it uses RSS feeds and custom filters to select the headlines we want and posts them automatically on a schedule. Eventually, we might use something like Deepseek to scrape sites that no longer offer RSS, and maybe even analyze the articles for relevance before posting. At this moment the channel runs automatically, it requires literally 0 labor to sustain. I’m not aware of any org that have a low-stakes, public-facing point of entry like this. They seem to assume that the more labor they put into something the greater its impact and this results in a lot of wasted effort. This automated approach lets us maintain a presence with 0 effort while freeing up energy for other things we want to work on. It’s essentially self-sustaining, I mean, how cool is that!

perpetuate a surveillance state

I mean by many metrics China would be considered a surveillance state (and not just liberal metrics). They have a different cultural and legal approach to online privacy and device security, in fact researchers that work in China like that accessing data, even medical data, is more straightforward there. Our distrust should be directed at capitalism, not the ‘surveillance’ itself.

CriticalResist8@lemmygrad.ml · edit-2 27 days ago

deepseek cloud tbh. 5$ on the API gets you around 9 million words (input tokens are half the price). I have no idea what I’m gonna do with that amount of tokens but I’m probably going to be riding that 5$ for years to come lol. I also like that they don’t have auto billing and you have to top up your credit balance manually.

deepseek also has a 128,000 tokens context window which is just huge, that’s like 100k words. You could basically send deepseek a whole novel (60k words) and it will still have 30k words with which to write an output. But due to how context currently works in llms it will probably get confused or completely forget some parts of it, so I wouldn’t recommend doing that. But compare to chatGPT which gets lost and automatically cuts off your prompt after 3 paragraphs.

However don’t think deepseek is secure. Your data is still stored on your account and has to transit over the open web even if it ends up in servers in China. With local LLMs you can set it to delete chat history as soon as you close the program, so once it’s generated, it’s gone.

As for some uses,

translation of theory that was previously not available in X language. With a script and API access you could just automate this and pump out translations to 10 languages as quickly as you find the books. Now that I have some deepseek API access I might actually work on a pipeline for this.
producing agitprop, incl for example images you can quickly print on flyers prior to a protest or event, or stickers. These are flyers that will often be discarded anyway so the image doesn’t need to be great. But I’m thinking for example of the Samuel L. Jackson edit (the L stands for Lenin), people loved it when it came out. It’s one of the top-rated post on r/AIart which is just wild to think about, it’s also the only political post there. We just need to find the right idea and then AI can execute.
coding, of course. any stuff that you need to automate for your org or project, deepseek can probably code
generating questions to chapter you are reading in book club (then go through a process of perfecting them every time you run the book club and eventually you’ll have a standard set of questions that you serve every time you run the book. In fact I kinda want to do that for ProleWiki and make them accessible so orgs don’t have to make their own individually)
submit drafts to steelman them prior to release, though we might want a fine-tuned model for that. Can also make sure language works and will speak to people but again I’m not sure llms are quite there yet. Maybe if you give it a persona whom you want the draft to speak to. But marxism being so specific and developed I don’t think llms can quite grasp it at the level veteran comrades do.
Create content cascade for your online agitation work. Submit your argument about a topical issue to the LLM and ask it to derive it into a tweet, an instagram story, a bullet-point list for a flyer etc. imo this is something we don’t do enough of, there’s no need to reinvent the wheel each time and we need automated tech stacks to handle this work.
deepseek gave me a surprising observation considering it comes from an LLM, “Problem: The right-wing dominates online meme culture, which shapes common sense.” even it is aware of this lol. It also offered this example solution “Generate an image in a photorealistic style of a billionaire like Jeff Bezos as a giant, bloated king sitting on a throne made of Amazon warehouses, while tiny workers struggle to hold him up”. I really liked the Angela Rayner rap someone in the UK made, if you didn’t see it a link is in one of my recent comments just ctrl f her name. With AI we can produce this content very, very quickly, the potential is unmatched. Open the floodgates of social media and let the communists out. We just need to do it correctly and for that you need to have a vision and guide the AI.
automated minutes taking, though I’m aware at this time it’s not perfect bc LLMs are still prone to hallucinating. speech-to-text models are pretty good at it though, it’s the summarizing part that needs work. But then an LLM could automatically create the cascade for members and with code (which the llm writes) post the bullet-points to your Signal or Telegram automatically. This simplifies admin work which I have done with my party and it sucks esp as so many orgs don’t have a tech stack whatsoever and do everything manually.
research! I use perplexity for this as it cites sources. When people ask me something and I don’t have an answer I ask perplexity to learn about it. You preferably guide it with keywords over giving it direct instructions. It has definitely helped me find sources that I knew existed but didn’t save and has also helped me research and write arguments for a LOT of my writing, more than people might think.
I think it can also help in analytics because it can do Python but you need data for that in the first place, which a lot of orgs don’t have because they’re not interested in getting it. With mixture-of-experts it can probably get very good results as a data analyst.
In the same vein, automated reports on stuff that you might want to keep an eye on. This doesn’t necessarily need AI but AI can more finely decide what to send you over ‘old-school’ regex filters. Send it an RSS feed periodically, it opens and analyzes the stories, then can send you an email that says “hey there might be potential here, check this out”.
Okay this is more ‘black hat’ but you can also scour the web with AI and find people that post publicly on social media complaining about their rent, wage etc. Then either it puts them in a pipeline for you to reach out to manually or the llm does it automatically within the boundaries you set. Either way you kinda need an llm there to analyze sentiment. And yes reaching out individually on social media works and it’s very underrated, people think the only thing you can do on those platforms is broadcast to your followers.

^ to ask deepseek I asked it to “First take the time to remember all that you can about party-building in the leninist tradition so that you can tailor your answer to solve actual usecases and real-world problems communist organizations face” (and yes I wrote leninist on purpose so as not to trigger the potential censor and make sure it accessed the right knowledge, I found you kinda have to speak to them in their language). I didn’t just ask it “how can communist organizations use AI to help their work”, in fact I actually tried that prompt just now and the quality is definitely lower. It tries to answer as best it can but with less input data to work with it doesn’t understand what you’re actually looking for, you have to communicate that.

And I think there are still lots of emergent uses to be discovered. Also with open-source models and interfaces orgs can already, today, host their own server and provide access to the web interface you can access from your browser at home. That way they can host an LLM for everyone in their org instead of everyone having to host their own.

CriticalResist8@lemmygrad.ml · 1 month ago

Chinese companies make the most popular free [open-source/weights] AI models

CriticalResist8@lemmygrad.ml · 1 month ago

It’s a vast answer that I’m not 100% finished with myself, but the premise imo is the same way many of these countries jumped through the adoption of the personal computer straight to smartphones. They didn’t have the ‘development’ (infrastructure, budget, industry etc) to support personal computers but once smartphones came around they modernized with those directly, and 4G too without even going through cable internet – 4G is super popular in Africa even in poorer areas and they’re investing in coverage.

In the same way they see AI as something they can adopt to help with their national challenges, for example healthcare to name just one – which is a very complex problem with brain drain, lack of infrastructure for people to get to the hospital, etc – so if they find a way to provide healthcare with AI somewhere in the process, they could treat more people more easily. Other industries are food, construction, education, electrification, etc.

From an economic perspective it reduces cost of production when you integrate it into the process and therefore can help countries under sanctions and embargos get more mileage out of what they do have available. H100 gpus are forbidden from being exported to China, they can only get H20 which have 20% of the capabilities, so they are developing their own alternative - probably using AI in the process to develop them faster (maybe not yet in chips directly but I know they’re using AI in other industries already). It’s helping stretch what they can access to get the most out of it. In the meantime, they are stretching these H20s like with alibaba’s new cloud algorithm which was posted on the grad some time ago, that reduced the resources load by 82% and therefore fewer GPUs needed to support their center.

Iran has recently published guidelines for AI usage in academia, and they now allow it provided you note the model you used, time used, and that you can prove you understand the topic. All of these countries are also very interested in open-source AI since they can develop on it and avoid one-sided proprietary deals. They have a need to “catch up” as fast as possible and see AI as a way to accelerate their development and close the gap with the imperial core.

And of course Cuba announced not so long ago it would make its own LLM, though I’m not sure where that is at currently.

We are still in the premises of it all of course, but that’s the trend I’m seeing. It’s difficult to find info from these countries about how they are using or plan to use AI right now, but I did find this news that Malaysia, Rwanda and the UAE have signed a strategic partnership to boost AI adoption in the global south: https://www.bernama.com/en/news.php?id=2451825

CriticalResist8@lemmygrad.ml · 1 month ago

It was revealed by the financial Times that in China at any time electricity production outpaces demand by 2-3x, + they are under trade embargo for nvidia chips so they have to be creative with what they have. For them and global South countries ai is potentially a way to nullify sanctions and provide, it’s existential. The race is already over, the west can’t compete with that.