PSA: Allow and Disallow /foo/*/bar/ isn't supported in Gemini

Quick reminder: Robots.txt in Gemini only supports the original robots.txt standard, which is a very small subset of modern robots.txt feature. Essentially the only allowed rule type is `Disallow`, and even there it only defines a "prefix". It does not support wildcard characters in the middle of rule.

So all of the follow robots.txt lines are ignored:

Allow: /~fuagem/

Disallow /~id10t/*/tree

crawl-delay: 1

Solderpunk's Robots.txt for Gemini companion spec

Posted in: s/Gemini

๐Ÿง‡ Acidus

24 hours ago

2 Comments โ†“

โ˜€๏ธ sbr ยท Jul 07 at 12:29:

Does that mean the crawler is back to life? ;-)

๐Ÿš€ clseibold ยท 21 hours ago:

Note to others: AuraSearch supports all of those lines in robots.txt as well as wildcards. So even if these de facto standards aren't supported by Kennedy's crawler, you can still use them for AuraSearch's crawler without harm to Kennedy's crawler.

Kennedy is not the only search engine that exists, and it would be more work than it's worth to disable support for these robots.txt features in the library I'm using just because Solderpunk's version of the "spec" wants to be overly restrictive. So if people want to use them specifically for AuraSearch, then you can, but keep in mind that it's only for AuraSearch.

The lines mentioned in the OP are part of the *de facto* standard, and because they are so widespread, their support will never be removed from AuraSearch.


Source