Beta testing Stad.social

@[email protected]

  • 5 Posts
  • 37 Comments
Joined 1 year ago
cake
Cake day: October 1st, 2023

help-circle














  • My own. My Emacs config grew over years to several thousand lines, and it got to a point where I decided I could write an editor in fewer lines that it took to configure Emacs how I liked it. It’s … not for everyone. I’m happy with it, because it does exactly only the things I want it to, and nothing else, but it does also mean getting used to quirks you can’t be bothered to fix, and not getting to blame someone else when you run into a bug.

    That said, writing your own editor is easier than people think, as long as you leverage libraries for whichever things you don’t have a pressing need to customize (e.g. mine is written in Ruby, and I use Rouge for syntax highlighting, and I believe Rouge is more lines of code than the editor itself thanks to all the lexers)




  • The thing, is realistically it won’t make a difference at all, because there are vast amounts of public domain data that remain untapped, so the main “problematic” need for OpenAI is new content that represents up to data language and up to date facts, and my point with the share price of Thomson Reuters is to illustrate that OpenAI is already getting large enough that they can afford to outright buy some of the largest channels of up-to-the-minute content in the world.

    As for authors, it might wipe a few works by a few famous authors from the dataset, but they contribute very little to the quality of an LLM, because the LLM can’t easily judge during training unless you intentionally reinforce specific works. There are several million books published every year. Most of them make <$100 in royalties for their authors (an average book sell ~200 copies). Want to bet how cheap it’d be to buy a fully licensed set of a few million books? You don’t need bestsellers, you need many books that are merely sufficiently good to drag the overall quality of the total dataset up.

    The irony is that the largest benefactor of content sources taking a strict view of LLMs will be OpenAI, Google, Meta, and the few others large enough to basically buy datasets or buy companies that own datasets because this creates a moat for those who can’t afford to obtain licensed datasets.

    The biggest problem won’t be for OpenAI, but for people trying to build open models on the cheap.


  • It won’t really matter, because there will continue to be other sources.

    Taken to an extreme, there are indications OpenAI’s market cap is already higher than Tomson Reuters ($80bn-$90bn vs <$60bn), and it will go far higher. Getty, also mentioned, has a market cap of “only” $2.4bn. In other words: If enough important sources of content starts blocking OpenAI, they will start buying access, up to and including if necessary buying original content creators.

    As it is, while BBC is clearly not, some of these other content providers are just playing hard to get and hoping for a big enough cash offer either for a license or to get bought out.

    The cat is out of the bag, whatever people think about it, and sources that block themselves off from AI entirely (to the point of being unwilling to sell licenses or sell themselves) will just lose influence accordingly.

    This also presumes OpenAI remains the only contender, which is clearly not the case in the long run given the rise of alternative models that while mostly still not good enough, are good enough that it’s equally clearly just a matter of time before anyone (at least, for the time being, for sufficiently rich instances of “anyone”, with the cost threshold dropping rapidly) can fine-tune their own models using their own scraped data.

    In other words, it may make them feel better, but in the long run it’s a meaningless move.

    EDIT: What a weird thing to downvote without replying to. I’ve taken no stance on whether BBC’s decision is morally right or not, just addressed that it’s unlikely to have any effect, and you can dislike that it won’t have any effect but thinking it will is naive.


  • My first “paid” programming project (I was paid in a used 20MB harddrive, which was equivalent to quite a bit of money for me at the time):

    Automate a horse-race betting “system” that it was blatantly obvious to me even at the time, at 14 or so, was total bullshit and would just lose him money. I told the guy who hired me as much. He still wanted it, and I figured since I’d warned him it was utter bunk it was his problem.




  • When they say “can’t be blocked” I presume they mean “can’t be blocked with the block function in X/Twitter”. They also say it can’t be liked or retweeted.

    So far ads have been treated as sort-of regular posts that are just shown according to the ad rules rather than because they belong in the timeline under normal criteria, and you could like, retweet and block them just like any other post.

    So this is basically them treating ads as a fully separate thing rather than just a different post type.

    Though the article suggests they’ll still try to make them look mostly like posts, except without showing a handle etc. though, which is extra scummy