https://www.washingtonpost.com/technology/interactive/2023/ai-chatbot-learning/#lookup-table
excited to see everyone search for themselves in this lookup of sites that went into Google's C4 dataset (my website is ranked 5,681,776)
@ingrid I'm 1,063,823 😬
@nev I can't decide if this will become a clout metric or just determine how many pennies get parceled out to us in the class action settlement
@ingrid my knitting blog is apparently ranked 880,706.
@writerethink I'm excited to see this become a clout thing ("is your website even in the C4 dataset?")
@ingrid the site I worked on most extensively is 200k something
@ingrid
Sounds good to me
@ingrid I'm at 951,389! (a local transit advocacy website I write) Fascinating read.
@c_9 it's a really good article!
Beat me, I'm 11,400,284. Still overrepresenting old white guys in data models.
@ingrid oh no, my composer website is 667,852
they didn't get my siddur site tho lmao
@ingrid happy not to see myself in there!
@ingrid
I'm 5,485,202 (business site) and 1,492,191 (currently hidden personal blog), for a total of 0.00001% of tokens. They didn't get my photoblog at least.
I wonder what 0.00001% of those companies' profits is?
I wonder if it's more or less than the cost of a brick thrown through their window?
@ingrid My personal blog is 707,095. ¯\_(ツ)_/¯ Don’t like it.
@ingrid some people will pay for a WaPo subscription just to vanity search themselves but I am not one of those people. Nice try, WaPo.
@ingrid it looks like a good amount of mastodon instances have been fed into the training data somewhere too which is upsetting
@ingrid My highest-ranked site is somewhere below 10M. I’m perversely pleased my personal site beat my work site.
@ingrid I'm pretty far down there myself 🙂
3,485,216 allthepages.org 4.6k
0.000003%
do I feel good or bad that my website is ranked higher than the website of someone I don't like very much, this is hard