Mastodon - "Decentralized" Twitter Knockoff & Rat King Breeding Ground

  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
Not trying to necro thread but I felt it made more sense just to keep it in this thread.

Anyone had any experience with scraping a public mastodon profile? I know I can probably code something but for thousands of "toots" I was hoping there was something that already did it well.
I haven't used it in a while, but I'm guessing it should still work. Try snscrape.
 
I haven't used it in a while, but I'm guessing it should still work. Try snscrape.
Sadly no. At least not on my loonix machine. Instant errors on run, had to use pipx as well to even install it. Seems like its been 1+ year since update. Mast doesn't seem to have cf captcha as often so maybe something similar to twint can be done. Anyone with advice again is appreciated i dont want to fork this shit.
 
Not trying to necro thread but I felt it made more sense just to keep it in this thread.

Anyone had any experience with scraping a public mastodon profile? I know I can probably code something but for thousands of "toots" I was hoping there was something that already did it well.
Try asking @Bloom Worm Cross Field. He recently did some Mastodon scraping in the Drew DeVault thread.
See: https://kiwifarms.st/threads/drew-chadwick-devault-ddevault-sircmpwn.175606/post-19655277
 
Anyone had any experience with scraping a public mastodon profile? I know I can probably code something but for thousands of "toots" I was hoping there was something that already did it well.
The mastodon API is nice enough you can code up a scraper just by looking at what endpoints are being queried when you load up a profile. Don't even need to read the documentation - the frontend web apps for the implementations are doing all calls client side, so the network tab in dev tools shows an entire log of what you need to query. Everything gets returned in JSON. The API will even tell you how many requests you have left until you are rate limited in every reply. I coded up something for scraping all multimedia from a given profile in about two hours.
 
Not trying to necro thread but I felt it made more sense just to keep it in this thread.

Anyone had any experience with scraping a public mastodon profile? I know I can probably code something but for thousands of "toots" I was hoping there was something that already did it well.
Python:
import requests

from sys import argv
from time import sleep


def ap_get_doc(session, url) -> dict:
    resp = session.get(url)
    if not resp.ok:
        print(resp)
        return {}
    return resp.json()

def webfinger(handle):
    domain = handle.split("@")[-1]
    webfinger_url = \
        f"https://{domain}/.well-known/webfinger?resource=acct:{handle}"
    s = requests.get(webfinger_url)
    if not s.ok:
        raise Exception(s.text)
    document = s.json()
    for link in document["links"]:
        if link["type"] == "application/activity+json":
            # This link is the AP ID
            return link["href"]
  
    raise Exception()
      


def page_callback(page):
    # do whatever here
    print(page)
  

def download(user_url):

    s = requests.Session()
    s.headers.update({
        "Accept": "application/activity+json",
        "User-Agent": "frank 0.0"
    })

    document = ap_get_doc(s, user_url)
    outbox = document.get("outbox", None)
    if outbox is None:
        return
      
    document = ap_get_doc(s, outbox)
    first = document.get("first", None)
    if first is None:
        return
  
    document = ap_get_doc(s, first)
    next_page = document.get("next", None)
    while next_page is not None:     
        objects = document.get("orderedItems")
        page = []

        for object in objects:
            if object["type"] != "Create":
                continue
            note = object["object"]
            if note["type"] != "Note":
                continue
            if isinstance(note["source"], dict):
                content = note["source"]["content"]
            # OLD PLEROMA WHY
            elif isinstance(note["source"], str):
                content = note["source"]
            if content == " " or content == "":
                continue
            page.append({"id": note["id"], "published": note["published"], "content": content})
        page_callback(page)
      
        # progress marker
        print('.', end='', flush=True)
        sleep(0.3)
        document = ap_get_doc(s, next_page)
        next_page = document.get("next", None)
    s.close()
    print()
 
if __name__ == '__main__':
    # call with user@domain.com as the username, script handles the rest
    handle = argv[1]
    ap_id = webfinger(handle)
    download(ap_id)

Here's a snippet from something I wrote a while ago, it pages through the posts in an AP actor's outbox and prints the full json object out for each if it's a regular post (a "Note" in the protocol vernacular. Python3 and you only need requests.
You can very easily do this in shell scripts or similar, but maybe this will work for you. If you just want the content of posts you can have it save those to a file or whatever in the callback, as -is this doesn't do any pre-processing.
 
Last edited:
From a UK furry instance, woof.group, regarding the Online Safety Act (OSA) and interaction with OFCOM:

Letter to Ofcom Regarding the Online Safety Act
Woof.group Blog (archive.ph)
By Kyle Kingsbury
2025-01-28 14:48:43GMT
This private information is unavailable to guests due to policies enforced by third-parties.
Updates on the OSA
Woof.group Blog (archive.ph)
By Kyle Kingsbury
2025-02-05 01:52:55GMT
This private information is unavailable to guests due to policies enforced by third-parties.
 
Did I miss something and the gooners who run woof.group were harassed by Ofcom before? Or is this just another act of social justice warriorship? A US company who hosts their stuff at another US company trying to show how much they performatively care about their UK userbase?
 
Did I miss something and the gooners who run woof.group were harassed by Ofcom before? Or is this just another act of social justice warriorship? A US company who hosts their stuff at another US company trying to show how much they performatively care about their UK userbase?
Nobody wants to be subject to the first international lolsuit but I really want Donald Trump to have to mention some random furry porn website when discussing it.
 
TechCrunch: Mastodon says it doesn’t ‘have the means’ to comply with age verification laws (archive)
The statement follows a lively back-and-forth conversation earlier this week between Mastodon founder and CEO Eugen Rochko and Bluesky board member and journalist Mike Masnick. In the conversation, published on their respective social networks, Rochko claimed, “there is nobody that can decide for the fediverse to block Mississippi.” (The Fediverse is the decentralized social network that includes Mastodon and other services, and is powered by the ActivityPub protocol.)

“And this is why real decentralization matters,” said Rochko.

Masnick pushed back, questioning why Mastodon’s individual servers, like the one Rochko runs at mastodon.social, would not also be subject to the same $10,000 per user fines for noncompliance with the law.
TechCrunch: Mississippi’s age assurance law puts decentralized social networks to the test (archive)
decentralized-debate.webp
 
My mastodon feed is carefully curated, I only follow people who are normal. A bit of everything, from the (milder) furries who post interesting non-furry shit to the more normal poa.st posters who only sometimes post ironic soyjack memes. This way my feed is mostly SFW, safe for sanity and at the same time I get the glimpses of local dramas that these people boost. Sometimes my posts get boosted enough that they leave this circle of normal people.

Whenever that happens I have a game I play. You could call it a drinking game if you add some rules as to whenever you down a shot. The game has no name, "pornodon" is a candidate but you can suggest your own.

I go through new profiles of people who interacted with my posts and look into the media tab of their profiles. You lose points whenever they post lewd shit. -1 for lewd furry art, -2 for ugly tranny boobs and -5 for full frontal nudity. You can down alcohol shots instead of counting the negative points.

The fun part is the context. I am yet to see a photo taken in a room with no clothes or trash on the floor. I should post the better specimens to tranny sideshows. Or seeing that these people follow or boost stuff from really nice people, like Matt Parker from Standup Maths, or their local politicians (pretty common on German mastodon for die Linke followers), and a few posts down there are horrors well within my comprehension, or flicks which would make H. R. Giger pale.

The mastodon feed is my little personal void I sometimes look into, as a sport.
 
Back
Top Bottom