en.osm.town is one of the many independent Mastodon servers you can use to participate in the fediverse.
An independent, community of OpenStreetMap people on the Fediverse/Mastodon. Funding graciously provided by the OpenStreetMap Foundation.

Server stats:

272
active users

Hey Google's Association Service bot, thank you for the 400,000+ requests for assetlinks.json file over the last 9 hours, but we truly meant it when we said 404 - File Not Found. KThxBye.

OpenStreetMap Ops Team

GoogleAssociationService bot was kind enough to ask 1,000,000+ times yesterday for the same file from 4000+ Google IP addresses. Answer was the same 404 - File Not Found. The User-Agent does not provide a support link unlike their other bots.

@osm_tech The only solution I can see for all this shit is the IDP.

(And, because search-engines are so clueless about the history of the 'net: catb.org/jargon/html/I/Interne)

@mikro2nd
As much as I dislike Google, a lot of people & browsers still seem to be using them as a search engine...

Applying IDP to Google IP ranges would mean that nobody would be able to find #OpenStreetMap on the google search, and would instead probably get some scammers as the first result. I don't think that is ideal outcome.
@osm_tech

@mnalis @mikro2nd For now we've hardcoded a 429 - Too Many Requests response matching on UA + URL.

@osm_tech
It would be interesting to know if it helped... They should've stopped on that 404 too, so obviously there is buggy code involved :(

returning fake 200 with some dummy answer might be a next suggestion if 429 doesn't work (as it at least should use different code path).
@mikro2nd

@mnalis @mikro2nd Briefly tested 200 responses, didn't seem to have any impact. The 429 responses cut through stack and have minimal load impact now. Still ~12 req/second.

@osm_tech personally, I'd block all the #GAFAMs by their entire #ASN|s!

  • Fuck the crawlers; #Blackholing of their #DDoS attacks is the only feasible option!

  • Also send an #AbuseReport everytime they try that shite to them and all the providers from you till them...

@osm_tech what if you explicitly ban that path in your robots.txt file?

@osm_tech it’s becoming clear that we need to be able to block all crawlers somehow