Is there any search engine which is able to recognise and not index any website that uses ads?

dontblink@feddit.it · 1 day ago

Is there any search engine which is able to recognise and not index any website that uses ads?

Sylra@piefed.social · 23 hours ago

Ads aren’t only about the blatant banners on the side. There is also SEO blogspam, aggressive affiliate links and marketing, commercial websites trying desperately to sell you their services, and recently, AI-slop.

The only reliable way to filter all of this would be to use an intelligent LLM (ideally run locally) with your criteria in the prompt, filtering out websites and trying to find the “small and/or clean guys.” If you can’t beat AI, join them!

Otherwise, I like to use alternative search engines like Yandex, Qwant, Mojeek, Marginalia, and Wiby. If you’re willing to pay a bit: Kagi is really cool, check it out. I really like old-school webrings too: they are places where you can find a list of websites curated by other people.

But friend, you gotta learn to research smarter. Learn to use search operators, read about blogs that share search tricks such as this one: https://searchresearch1.blogspot.com

dontblink@feddit.it · 17 hours ago

The AI thing is very cool, I think something like that exists, agentic browsers.

But I am scared this would be just the next abstraction, of this chains of abstractions… Corporations are already using AI to profile you even further, the internet will definitely adapt under this pressure, and I believe that in a few years agentic browsers will just become the new norm.

Search engine at first were more objective, now people have learnt to play the game of SEO to attract views, search engines have started to show targeted results, and stuff like Searx came out, or Yaci, claiming to get back a more objective web. There always have been ways to filter out or to try being more objective, but I think the evidence have shown that as a social momentum, this stuff doesn’t work.

Yeah Yaci (self hosted crawler) is a great project, but it’s stagnant, and the prevalence is still a shitty google or bing searche engine, and this is true for other aspects of the web.

Social media? There were more independent social medias while centralized stuff was rampant, now there’s the fediverse and decentralization. Which is super super beautiful, but most people are just unaware. The social momentum is not saying that we are going towards a world where every little server will be connected to other little servers and decide in which parts integrate one another, that would be great, and I’d love to see that, but it’s simply not where we are currently going as a society.

Now it’s the turn of AI, it can be a helpful tool for a while to avoid it all, stuff like agentic browsers can give us some freedom for a while when they will be actually usable and reliable, but in that time the web will have evolved again and pheraps we’ll need to take into account new ways to defend ourselves or to look through the bushes.

It’s a never ending hide and seek unless something really big changes. Linux, free software, open source is all great, but we are continuously pushed towarbalance mainstream in some way or another. And most people live the mainstream, not in the alternative, despite the alternative being objectively better. It is just unsupported by our culture.

It’s an abstraction built on another abstraction built on another abstraction… And the web is just the most clear example of that, I mean the very languages in which the web is built are an example itself: JS (which already is high level)>React>Next. You see? Abstraction on abstraction.

But when will we stop to play games and just stay in the present? Focusing on the core of things?

Do we strive to get to a sort of technological ecstatic point in which all will actually be clear? A sort of technological philosopher stone? And the way to do that is through collection of loads and loads of human data?

My perspective on this is quite pessimistic, because it’s a form of cruel optimism to say that one can solve this problem individually. To change this would require a coordination of consumers, programmers and people revolving around all things of the internet to fix it, unless we assume that AI is somehow sentient and can be better at solving our problems than we do, which I do not exclude: faster and better at looking and processing novelty than we are.

But that will mean that us, as humans, will just be obsolete.

I always come to the conclusion that the web maybe it’s not worth getting used as it is right now, and maybe to feel good we should stop trying to relate to machines and instead just living our own biological needs… Focusing on beings which we can understand better… Living in the present… And stop running, whether it means running away, or towards. Rejecting culture and just staying in our own spaces, cultivating simplicity and balance.

Sorry for the philosophycal rant lmao, I guess this was just more than a technical problem for me lmao, but thanks for your answer!

Sylra@piefed.social · 12 hours ago

You can already use proprietary cloud-based LLMs like OpenAI’s ChatGPT or xAI’s Grok. If you explain to them in the prompt what “niche, enthusiast, passionate websites” are and how to find them, they can definitely help you and give you much better results than Google even in their current state. “Hallucinations” are a complete non-issue. If the LLM gives you two non-relevant links out of ten, with the rest being correct, that is still better than Google, where you might only get one relevant link out of fifty.

Now, thankfully, you do not have to rely on the cloud. If you have some DIY skills and a fair amount of computing power at home, you can run a setup locally that rivals cloud-based LLM searches in performance.

Unfortunately, it is somewhat of an arms race as you said. Advertisers and marketers aim to target people who stick to defaults: the ones who search for “top 5 password managers” on Google and click the first result. That is their audience. LLMs are not a complete solution. There are clever ways to use them with well-crafted prompts, and there are simpler, less effective approaches. Those who remain with default behaviors will be absorbed by the system; those who make the effort to resist stand a better chance of avoiding marketing influence.

As an example, some people began adding “site:reddit.com” to their searches in an attempt to get real opinions from real users. I can assure you, marketing firms have caught up with this tactic. Due to widespread astroturfing, I no longer consider Reddit a reliable source.