Geminispace is almost certainly being fed into LLMs
Or: “Did you ever have the feelin’ you was bein’ watched?”
Or: “Did you ever have the feeling you was being watched?”
First published February 1, 2025.
Last updated minorly March 17, 2025.
Ever since LLM scraping has become a thing done by lots of companies for lots of companies, I’ve wondered if and when they’d start scraping Geminispace.
On one hand, there’s not a whole lot of stuff written in Geminispace.
On the other hand, writing a Gemini client is easy.
On the third hand, I’ve heard that basically the entire Web has been scraped (or blacklisted) and LLMs just plain do better with more input.
On the…I’m not sure which hand is the “gripping” hand, but anyway…LLMs basically process files as Markdown. Microsoft has a convert-to-Markdown utility so it can properly ingest all kinds of things (HTML, PDFs, etc.) into its LLMs. You know what looks a LOT like Markdown? Gemtext.
microsoft/markitdown on GitHub
Today, on February 1, 2025, I saw this post:
Rob S., “Trampled By the Elephants”
Moneyest quote:
I'm not the only one dealing with this issue. I've recently seen many posts on Antenna of site operators, both Gemini and otherwise, dealing with extremely aggressive crawlers and bots. Several originate from Chinese cloud hosting providers, but they come from all over the world, AWS also being a common culprit. Of course, the mainline Internet has dealt with flooding issues from cloud providers for years, and the problem shows no signs of getting better. But I haven't seen the problem this pervasive in the small Web before.
⁂
So if you’ve ever wanted to feed LLMs your point of view without bothering to write/generate HTML or type stuff on Reddit, do I have good news for you!
If you don’t, well…then I have bad news for you.
⏚
Source