Sonnenmulde Federated

Jan Beta

3 Monate her • •

Jan Beta
3 Monate her • •

Is there a list of things to add to a robots.txt file to prevent AI scraping and such? (Or at least tell the scrapers to F off, I am well aware they mostly don’t care.)

teilten dies erneut

Als Antwort auf Jan Beta

friendica (DFRN) - Link zum Originalbeitrag

AndiS 🌞🍷🇪🇺

Als Antwort auf Jan Beta • 3 Monate her von Relatica •

Boosting this because a few days ago a list flew by in my timeline.

Jan Beta mag das.

Als Antwort auf Jan Beta

Akseli

Als Antwort auf Jan Beta • 3 Monate her • •

heres something funnier zadzmo.org/code/nepenthes/

ZADZMO code

^zadzmo.org

Als Antwort auf Jan Beta

datort

Als Antwort auf Jan Beta • 3 Monate her • •

That's actually quite an interesting question, never thought about it. I guess this Repo I just googled is already known? github.com/ai-robots-txt/ai.ro…

Looks kind of promising to me.

GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

^GitHub

Als Antwort auf datort

Jan Beta

Als Antwort auf datort • 3 Monate her • •

@datort Looks good, thanks! I have something similar in place already but haven’t updated in a while. This seems to be more up to date.

@datort

Als Antwort auf Jan Beta

Jenny753

Als Antwort auf Jan Beta • 3 Monate her • •

github.com/ai-robots-txt/ai.ro…

ai.robots.txt/robots.txt at main · ai-robots-txt/ai.robots.txt

A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

^GitHub

Als Antwort auf Jenny753

Jan Beta

Als Antwort auf Jenny753 • 3 Monate her • •

@jenny753 Looking good, thanks! :)

@Jenny753

Als Antwort auf Jan Beta

Simon Zerafa

Als Antwort auf Jan Beta • 3 Monate her • •

Many of the AI bots don't respect Robots.txt 🫤

They have to be blocked with agent strings or by IP range.

I have something configured for my Wordpress site. I'll have to dig out the details when I'm home from work 🙂

Unbekannter Ursprungsbeitrag

Jan Beta

Unbekannter Ursprungsbeitrag • 3 Monate her • •

@Papierzeit 😅 I‘m not sure if that tip is viable for me at this point. (Although admittedly it’s the safest option.)

Unbekannter Ursprungsbeitrag

Jan Beta

Unbekannter Ursprungsbeitrag • 3 Monate her • •

@Papierzeit I totally understand. I miss the times when you basically could control every single bit you sent out into the world. And just hang up when you were done with communicating.

Als Antwort auf Jan Beta

Koen Martens

Als Antwort auf Jan Beta • 3 Monate her • •

Adding a reply from mastodon too, because it seems my freshly installed friendica server isn't all that functional yet, but there's this: This? github.com/ai-robots-txt/ai.ro… - i've configured them to be redirect rules in my web server config to give a 404 when a request with one of those user agents comes in, because as you said, what scraper respects robots.txt?

GitHub - ai-robots-txt/ai.robots.txt: A list of AI agents and robots to block.

A list of AI agents and robots to block. Contribute to ai-robots-txt/ai.robots.txt development by creating an account on GitHub.

^GitHub

Als Antwort auf Jan Beta

sebastian

Als Antwort auf Jan Beta • 3 Monate her • •

Robots.txt won't work.
I have a long list of useragents and ip ranges in my nginx config, that I gathered from various sources. Any request matching those, is an instant 403. If any IP accumulates enough 403s, fail to ban will ban it for a week.

Apparently Google hates me for it and decided that I won't get on the first page any more even if the search term is the domain of my website. Then again google results are shitty already and keep getting shittier, so that might not matter in the long run.

Als Antwort auf Jan Beta

KungFuDiscoMonkey

Als Antwort auf Jan Beta • 3 Monate her • •

darkvisitors.com/ is one I bookmarked a while back to check out, but I have not implemented it on my site yet.

Dark Visitors - Track AI Agents and Control Bot Traffic

Get realtime insight into the hidden ecosystem of artificial agents scraping, crawling, archiving, and gathering intelligence on your website

^{Dark Visitors}

Home of Weingut Sonnenmulde in the Fediverse - Join the Federation!

⇧