Off-and-on trying out an account over at @[email protected] due to scraping bots bogging down lemmy.today to the point of near-unusability.

  • 48 Posts
  • 2.84K Comments
Joined 2 years ago
cake
Cake day: October 4th, 2023

help-circle
  • For some workloads, yes. I don’t think that the personal computer is going to go away.

    But it also makes a lot of economic and technical sense for some of those workloads.

    Historically — like, think up to about the late 1970s — useful computing hardware was very expensive. And most people didn’t have a requirement to keep computing hardware constantly loaded. In that kind of environment, we built datacenters and it was typical to time-share them. You’d use something like a teletype or some other kind of thin client to access a “real” computer to do your work.

    What happened at the end of the 1970s was that prices came down enough and there was enough capability to do useful work to start putting personal computers in front of everyone. You had enough useful capability to do real computing work locally. They were still quite expensive compared to the great majority of today’s personal computers:

    https://en.wikipedia.org/wiki/Apple_II

    The original retail price of the computer was US$1,298 (equivalent to $6,700 in 2024)[18][19] with 4 KB of RAM and US$2,638 (equivalent to $13,700 in 2024) with the maximum 48 KB of RAM.

    But they were getting down to the point where they weren’t an unreasonable expense for people who had a use for them.

    At the time, telecommunications infrastructure was much more limited than it was today, so using a “real” computer remotely from many locations was a pain, which also made the PC make sense.

    From about the late 1970s to today, the workloads that have dominated most software packages have been more-or-less serial computation. While “big iron” computers could do faster serial compute than personal computers, it wasn’t radically faster. Video games with dedicated 3D hardware were a notable exception, but those were latency sensitive and bandwidth intensive, especially relative to the available telecommunication infrastructure, so time-sharing remote “big iron” hardware just didn’t make a lot of sense.

    And while we could — and to some extent, did — ramp up serial computational capacity by using more power, there were limits on the returns we could get.

    However, what AI stuff represents has notable differences in workload characteristics. AI requires parallel processing. AI uses expensive hardware. We can throw a lot of power at things to get meaningful, useful increases in compute capability.

    • Just like in the 1970s, the hardware to do competitive AI stuff for many things that we want to do is expensive. Some of that is just short term, like the fact that we don’t have the memory manufacturing capacity in 2026 to meet need, so prices will rise to price out sufficient people that the available chips go to whoever the highest bidders are. That’ll resolve itself one way or another, like via buildout in memory capacity. But some of it is also that the quantities of memory are still pretty expensive. Even at pre-AI-boom prices, if you want the kind of memory that it’s useful to have available — hundreds of gigabytes — you’re going to be significantly increasing the price of a PC, and that’s before whatever the cost of the computation hardware is.

    • Power. Currently, we can usefully scale out parallel compute by using a lot more power. Under current regulations, a laptop that can go on an airline in the US can have an 100 Wh battery and a 100 Wh spare, separate battery. If you pull 100W on a sustained basis, you blow through a battery like that in an hour. A desktop can go further, but is limited by heat and cooling and is going to start running into a limit for US household circuits at something like 1800 W, and is going to be emitting a very considerable amount of heat dumped into a house at that point. Current NVidia hardware pulls over 1kW. A phone can’t do anything like any of the above. The power and cooling demands range from totally unreasonable to at least somewhat problematic. So even if we work out the cost issues, I think that it’s very likely that the power and cooling issues will be a fundamental bound.

    In those conditions, it makes sense for many users to stick the hardware in a datacenter with strong cooling capability and time-share it.

    Now, I personally really favor having local compute capability. I have a dedicated computer, a Framework Desktop, to do AI compute, and also have a 24GB GPU that I bought in significant part to do that. I’m not at all opposed to doing local compute. But at current prices, unless that kind of hardware can provide a lot more benefit than it currently does to most, most people are probably not going to buy local hardware.

    If your workload keeps hardware active 1% of the time — and maybe use as a chatbot might do that — then it is something like a hundred times cheaper in terms of the hardware cost to have the hardware timeshared. If the hardware is expensive — and current Nvidia hardware runs tens of thousands of dollars, too rich for most people’s taste unless they’re getting Real Work done with the stuff — it looks a lot more appealing to time-share it.

    There are some workloads for which there might be constant load, like maybe constantly analyzing speech, doing speech recognition. For those, then yeah, local hardware might make sense. But…if weaker hardware can sufficiently solve that problem, then we’re still back to the “expensive hardware in the datacenter” thing.

    Now, a lot of Nvidia’s costs are going to be fixed, not variable. And assuming that AMD and so forth catch up, in a competitive market, will come down — with scale, one can spread fixed costs out, and only the variable costs will place a floor on hardware costs. So I can maybe buy that, if we hit limits that mean that buying a ton of memory isn’t very interesting, price will come down. But I am not at all sure that the “more electrical power provides more capability” aspect will change. And as long as that holds, it’s likely going to make a lot of sense to use “big iron” hardware remotely.

    What you might see is a computer on the order of, say, a 2022 computer on everyone’s desk…but that a lot of parallel compute workloads are farmed out to datacenters, which have computers more-capable of doing parallel compute there.

    Cloud gaming is a thing. I’m not at all sure that there the cloud will dominate, even though it can leverage parallel compute. There, latency and bandwidth are real issues. You’d have to put enough datacenters close enough to people to make that viable and run enough fiber. And I’m not sure that we’ll ever reach the point where it makes sense to do remote compute for cloud gaming for everyone. Maybe.

    But for AI-type parallel compute workloads, where the bandwidth and latency requirements are a lot less severe, and the useful returns from throwing a lot of electricity at the thing significant…then it might make a lot more sense.

    I’d also point out that my guess is that AI probably will not be the only major parallel-compute application moving forward. Unless we can find some new properties in physics or something like that, we just aren’t advancing serial compute very rapidly any more; things have slowed down for over 20 years now. If you want more performance, as a software developer, there will be ever-greater relative returns from parallelizing problems and running them on parallel hardware.

    I don’t think that, a few years down the road, building a computer comparable to the one you might in 2024 is going to cost more than it did in 2024. I think that people will have PCs.

    But those PCs might running software that will be doing an increasing amount of parallel compute in the cloud, as the years go by.


  • So an internet

    The highest data rate it looks like is supported by LoRa in North America is 21900 bits per second, so you’re talking about 21kbps, or 2.6kBps in a best-case scenario. That’s about half of what an analog telephone system modem could achieve.

    It’s going to be pretty bandwidth-constrained, limited in terms of routing traffic around.

    I think that the idea of a “public access, zero-admin mesh Internet over the air” isn’t totally crazy, but that it’d probably need to use something like laser links and hardware that can identify and auto-align to other links.



  • GitHub explicitly asked Homebrew to stop using shallow clones. Updating them was “an extremely expensive operation” due to the tree layout and traffic of homebrew-core and homebrew-cask.

    I’m not going through the PR to understand what’s breaking, since it’s not immediately apparent from a quick skim. But three possible problems based on what people are mentioning there.

    The problem is the cost of the shallow clone

    Assuming that the workload here is always --depth=1 and they aren’t doing commits at a high rate relative to clones, and that’s an expensive operation for git, I feel like for GitHub, a better solution would be some patch to git that allows it to cache a shallow clone for depth=1 for a given hashref.

    The problem is the cost of unshallowing the shallow clone

    If the actual problem isn’t the shallow clone, that a regular clone would be fine, but that unshallowing is a problem, then a patch to git that allows more-efficient unshallowing should be a better solution. I mean, I’d think that unshallowing should only need a time-ordered index of commits referenced blobs up to a given point. That shouldn’t be that expensive for git to maintain an index of, if it doesn’t already have it.

    The problem is that Homebrew has users repeatedly unshallowing a clone off GitHub and then blowing it away and repeating

    If the problem is that people keep repeatedly doing a clone off GitHub — that is, a regular, non-shallow clone would also be problematic — I’d think that a better solution would be to have Homebrew do a local bare clone as a cache, and then just do a pull on that cache and then use it as a reference to create the new clone. If Homebrew uses the fresh clone as read-only and the cache can be relied upon to remain, then they could use --reference alone. If not, then add --dissociate. I’d think that that’d lead to better performance anyway.






  • Notably, this and dotfiles are popular among devs using Mac, since MacOS has nearly all settings available either via config files or the defaults system from the command line. In comparison, Windows is total ass about configuring via the command line, and even Cinnamon gives me some headache by either not reloading or straight up overwriting my settings.

    The application-level format isn’t really designed for end user consumption, but WINE uses a text representation of the Windows registry. I imagine that one could probably put that in a git registry and that there’s some way to apply that to a Windows registry. Or maybe a collectiom of .reg files, which are also text.



  • Well…I’m agreeing that it happened and was a factor, but also pointing out that the “don’t let black people have guns” practice predated the Black Panthers stuff by a considerable amount of time.

    EDIT: Basically, a major concern in the US in the runup to the American Civil War was the prospect of a slave uprising. There were a lot of black people in the US who had been kept as slaves and were not super happy about the fact.

    At about the same time, in Haiti, there had been such an uprising.

    https://en.wikipedia.org/wiki/Haitian_Revolution

    Shortly after the revolution:

    https://en.wikipedia.org/wiki/1804_Haitian_massacre

    The 1804 Haiti massacre was carried out by Haitian rebel soldiers, mostly former slaves, under orders from Jean-Jacques Dessalines[1][2][3][4] against much of the remaining European population in Haiti, which mainly included French Colonists.[5][6] The Haitian Revolution defeated the French army in November 1803 and the Haitian Declaration of Independence happened on 1 January 1804.[7]

    Throughout the early-to-mid nineteenth century, the events of the massacre were well known in the United States. Additionally, many Saint Domingue refugees moved from Saint-Domingue to the U.S., settling in New Orleans, Charleston, New York, Baltimore, and other coastal cities. These events spurred fears of potential uprisings in the Southern U.S. and they also polarized public opinion on the question of the abolition of slavery.[9][10]

    At the time of the American Civil War, a major pretext for Southern whites, most of whom did not own slaves, to support slave owners (and ultimately fight for the Confederacy) was fear of a slave uprising similar to the Haitian Revolution.[34] The perceived failure of abolition in Haiti and Jamaica were explicitly referred to in Confederate discourse as a reason for secession.[35] The slave revolt was a prominent theme in the discourse of Southern political leaders and had influenced U.S. public opinion since the events took place. Historian Kevin Julius writes:

    As abolitionists loudly proclaimed that “All men are created equal”, echoes of armed slave insurrections and racial genocide sounded in Southern ears. Much of their resentment towards the abolitionists can be seen as a reaction to the events in Haiti.[9]

    In the run-up to the U.S. presidential election of 1860, Roger B. Taney, Chief Justice of the Supreme Court, wrote “I remember the horrors of St. Domingo” and said that the election “will determine whether anything like this is to be visited upon our own southern countrymen.”[10]

    Abolitionists recognized the strength of this argument on public opinion in both the North and South. In correspondence to the New York Times in September 1861 (during the war), an abolitionist named J. B. Lyon addressed this as a prominent argument of his opponents:

    We don’t know any better than to imagine that emancipation would result in the utter extinction of civilization in the South, because the slave-holders, and those in their interest, have persistently told us … and they always instance the ‘horrors of St. Domingo.’[36]

    Lyon argued, however, that the abolition of slavery in the various Caribbean colonies of the European empires before the 1860s showed that an end to slavery could be achieved peacefully.[37]

    John Brown attempted to induce such a slave revolt:

    https://en.wikipedia.org/wiki/John_Brown's_raid_on_Harpers_Ferry

    From October 16th to 18th, 1859, American abolitionist John Brown attempted to initiate a slave revolt in Southern states by raiding an armory[nb 1] in Harpers Ferry, Virginia (now West Virginia). The raid is frequently cited as one of the primary causes of the American Civil War.[3]

    And you had Nat Turner’s Rebellion:

    https://en.wikipedia.org/wiki/Nat_Turner's_Rebellion

    Nat Turner’s Rebellion, historically known as the Southampton Insurrection, was a slave rebellion that took place in Southampton County, Virginia, in August 1831. Led by Nat Turner, the rebels, made up of enslaved African Americans, killed between 55 and 65 White people, making it the deadliest slave revolt for the latter racial group in U.S. history.

    So you have the situation after the American Civil War where you have a lot of now-free black people who the US Constitution guarantees the right to arms…and a lot of white people really worried about what happens if they get ahold of said arms. They went out and tried to figure out whatever loopholes they could to make sure that blacks didn’t have access to firearms.


  • The Black Panthers incident that you’re referring to:

    https://en.wikipedia.org/wiki/Mulford_Act

    The Mulford Act is a 1967 California statute which prohibits public carrying of loaded firearms without a permit.[2] Named after Republican assemblyman Don Mulford and signed into law by governor of California Ronald Reagan, the law was initially crafted with the goal of disarming members of the Black Panther Party, which was conducting armed patrols of Oakland neighborhoods in what would later be termed copwatching.[3][4] They garnered national attention after Black Panthers members, bearing arms, marched upon the California State Capitol to protest the bill.[5][6]

    But also, going back prior to that:

    https://en.wikipedia.org/wiki/Saturday_night_special

    The earliest law prohibiting inexpensive handguns was enacted in Tennessee, in the form of the “Army and Navy Law”, passed in 1879, shortly after the 14th amendment and Civil Rights Act of 1875; previous laws invalidated by the constitutional amendment had stated that black freedmen could not own or carry any manner of firearm. The Army and Navy Law prohibited the sale of “belt or pocket pistols, or revolvers, or any other kind of pistols, except army or navy pistols”, which were prohibitively expensive for black freedmen and poor whites to purchase.[21] These were large pistols in .36 caliber (“navy”) or .44 caliber (“army”), and were the military issue cap and ball black-powder revolvers used during the Civil War by both Union and Confederate ground troops. The effect of the law was to restrict handgun possession to the upper economic classes.[22]

    The next major attempt to regulate inexpensive firearms was the Gun Control Act of 1968, which used the “sporting purposes” test and a points system to exclude many small, inexpensive handguns which had been imported from European makers such as Röhm, located in Germany.


  • Oh, yeah, it’s not that ollama itself is opening holes (other than adding something listening on a local port), or telling people to do that. I’m saying that the ollama team is explicitly promoting bad practices. I’m just saying that I’d guess that there are a number of people who are doing things like fully-exposing or port-forwarding to ollama or whatever because they want to be using the parallel compute hardware on their computer remotely. The easiest way to do that is to just expose ollama without setting up some kind of authentication mechanism, so…it’s gonna happen.

    I remember someone on here who had their phone and desktop set up so that they couldn’t reach each other by default. They were fine with that, but they really wanted their phone to be able to access the LLM on their computer, and I was helping walk them through it. It was hard and confusing for them — they didn’t really have a background in the stuff, but badly wanted the functionality. In their case, they just wanted local access, while the phone was on their home WiFi network. But…I can say pretty confidently that there are people who want access all the time, to access the thing remotely.



  • The incident began from June 2025. Multiple independaent security researchers have assessed that the threat acotor is likely a Chinese state-sponsored group, which would explain the highly selective targeting obseved during the campaign.

    I do kind of wonder about the emacs package management infrastructure system. Like, if attacking things that text editors use online is an actively-used vector.


  • If the chips are just being hoarded to shut out competitors, which is what the OpenAI deal was rumoured to be about, we could see the unused chips getting bought and used, but equally likely we could see the chips (and dimms and cpus) deliberately shredded to prevent them falling into competitor hands.

    By the time the things are cycled out, they may not be terribly compute-competitive, in which case…shrugs

    Also, a major unknown is where models go. Say that a bunch of people decide that they can’t get access to parallel compute hardware or a lot of memory, and they research looking at models split up into MoEs or otherwise broken up. Recent LLM models have been oriented towards MoEs. Llama.cpp, and I assume the other engines capable of running LLMs, has the ability to offload experts that don’t fit in GPU memory to main memory when they aren’t actively being used. Then maybe…having a bank of consumer-level 24GB GPUs or something like that is fine, and having chips with direct access to very large amounts of memory isn’t all that interesting. Then, what becomes essential to being competitive changes.

    EDIT: I also think that it’s safe to say that more memory will probably always benefit. But I’ll also say that it is also probably very likely that our existing models are staggeringly inefficient with memory. We are still doing early passes at this.

    Let me give you an example. I have a LLama 3-based model currently loaded on my Framework Desktop that’s using 96GB of memory for the model and associated storage.

    Prompt: What is 1+1?

    Response: I can answer that. 1+1 = 2.

    Prompt: How about 37 times 12?

    Response: 37 times 12 is 444.

    Now, those are correct answers. But…in order to make an LLM capable of providing that correct response, to do it purely via running a neural net trained on natural language, we had to stick a really inefficient amount of data into memory. That same hardware that I’m running it on has the ability to do billions of integer computations per second. As of today, the software running that model doesn’t provide it access to the hardware, and the model was never trained to use it. But…it could be. And if it were, suddenly a lot of need for storing edges in some neural net wasted on arithmetic go away.

    Plus, we could get better results:

    Prompt: What about 783901/76523?

    Response: 783901 divided by 76523 is approximately 10.23.

    That’s not far off — it is approximately 10.23 — but it should have been rounded to 10.24.

    $ maxima -q
    
    (%i1) float(783901/76523);
    
    (%o1)                         10.243992002404506
    (%i2) 
    

    So we could probably get more-useful models that don’t waste a ton of space in the model if we gave the model access to the computational hardware that’s presently sitting idle and trained it to use it. That’s an off-the-cuff example, but I think that it highlights how we’re solving problems inefficiently in terms of memory.

    Same sort of thing with a lot of other problems that we have (immensely-more-efficient and probably accurate) software packages that we can already solve problems with. If you can train the model to use those and run the software in an isolated sandbox rather than trying to do it itself, then we don’t need to blow space in the LLM on the capabilities there, shrink it.

    If we reduce the memory requirements enough to solve a lot of problems that people want with a much-smaller amount of memory, or with a much-less-densely-connected set of neural networks, the hardware that people care about may radically change. In early 2026, the most-in-demand hardware is hugely-power-hungry parallel processors with immense amounts of memory directly connected to it. But maybe, in 2028, we figure out how to get models to use existing software packages designed for mostly-serial computation, and suddenly, what everyone is falling over themselves to get ahold of is more-traditional computer hardware. Maybe the neural net isn’t even where most of the computation is happening for most workloads.

    Maybe the future is training a model to use a library of software and to write tiny, throwaway programs that run on completely different hardware optimized for this “model scratch computation” purposes, and mostly it consulting those.

    Lot of unknowns there.


  • I posted in a thread a bit back about this, but I can’t find it right now, annoyingly.

    You can use the memory on GPUs as swap, though on Linux, that’s currently through FUSE — going through userspace — and probably not terribly efficient.

    https://wiki.archlinux.org/title/Swap_on_video_RAM

    Linux apparently can use it via HMM: the memory will show up as system memory.

    https://www.kernel.org/doc/html/latest/mm/hmm.html

    Provide infrastructure and helpers to integrate non-conventional memory (device memory like GPU on board memory) into regular kernel path

    It will have higher latency due to the PCI bus. It sounds like it basically uses main memory as a cache, and all attempts to directly access a page on the device trigger an MMU page fault:

    Note that any CPU access to a device page triggers a page fault and a migration back to main memory. For example, when a page backing a given CPU address A is migrated from a main memory page to a device page, then any CPU access to address A triggers a page fault and initiates a migration back to main memory.

    I don’t know how efficiently Linux deals with this for various workloads; if it can accurately predict the next access, it might be able to pre-request pages and do this pretty quickly. That is, it’s not that the throughput is so bad, but the latency is, so you’d want to mitigate that where possible. There are going to be some workloads for which that’s impossible: an example case would be just allocating a ton of memory, and then accessing random pages. The kernel can’t mitigate the PCI latency in that case.

    There’s someone who wrote a driver to do this for old Nvidia cards, something that starts with a “P”, that I also can’t find at the moment, which I thought was the only place where it worked, but it sounds like it can also be done on newer Nvidia and AMD hardware. Haven’t dug into it, but I’m sure that it’d be possible.

    A second problem with using a card as swap is going to be that a Blackwell card uses extreme amounts of power, enough to overload a typical consumer desktop PSU. That presumably only has to be used if you’re using the compute hardware, which you wouldn’t if you’re just moving memory around. I mean, existing GPUs normally use much less power than they do when crunching numbers. But if you’re running a GPU on a PSU that cannot actually provide enough power for it running at full blast, you have to be sure that you never actually power up that hardware.

    EDIT: For an H200 (141 GB memory):

    https://www.techpowerup.com/gpu-specs/h200-nvl.c4254

    TDP: 600 W

    EDIT2: Just to drive home the power issue:

    https://www.financialcontent.com/article/tokenring-2025-12-30-the-great-chill-how-nvidias-1000w-blackwell-and-rubin-chips-ended-the-era-of-air-cooled-data-centers

    NVIDIA’s Blackwell B200 GPUs, which became the industry standard earlier this year, operate at a TDP of 1,200W, while the GB200 Superchip modules—combining two Blackwell GPUs with a Grace CPU—demand a staggering 2,700W per unit. However, it is the Rubin architecture, slated for broader rollout in 2026 but already being integrated into early-access “AI Factories,” that has truly broken the thermal ceiling. Rubin chips are reaching 1,800W to 2,300W, with the “Ultra” variants projected to hit 3,600W.

    A standard 120V, 15A US household circuit can only handle 1,800W, even if you keep it fully loaded. Even if you get a PSU capable of doing that and dedicate the entire household circuit to that, beyond that, you’re talking something like multiple PSUs on independent circuits or 240V service or something like that.

    I have a space heater in my bedroom that can do either 400W or 800W.

    So one question, if one wants to use the card for more memory, is going to be what the ceiling on power usage that you can ensure that those cards will use while most of their on-board hardware is idle.


  • I mean, the article is talking about providing public inbound access, rather than having the software go outbound.

    I suspect that in some cases, people just aren’t aware that they are providing access to the world, and it’s unintentional. Or maybe they just don’t know how to set up a VPN or SSH tunnel or some kind of authenticated reverse proxy or something like that, and want to provide public access for remote use from, say, a phone or laptop or something, which is a legit use case.

    ollama targets being easy to set up. I do kinda think that there’s an argument that maybe it should try to facilitate configuration for that setup, even though it expands the scope of what they’re doing, since I figure that there are probably a lot of people without a lot of, say, networking familiarity who just want to play with local LLMs setting these up.

    EDIT: I do kind of think that there’s a good argument that the consumer router situation plus personal firewall situation is kind of not good today. Like, “I want to have a computer at my house that I want to access remotely via some secure, authenticated mechanism without dicking it up via misconfiguration” is something that people understandably want to do and should be more straightforward.

    I mean, we did it with Bluetooth, did a consumer-friendly way to establish secure communication over insecure airwaves. We don’t really have that for accessing hardware remotely via the Internet.