Preservetube - A Youtube archival site.

  • 🏰 The Fediverse is up. If you know, you know.
  • Want to keep track of this thread?
    Accounts can bookmark posts, watch threads for updates, and jump back to where you stopped reading.
    Create account
Youtube is estimated to have an exabyte of video data online right now.
I'd be interested to see the statistics on how much of that is comprised of videos that haven't been watched in over a decade, or ever. Surely they could nuke a couple million hours of MLG montage parodies and craft beer podcasts without anyone noticing?
 
I'd be interested to see the statistics on how much of that is comprised of videos that haven't been watched in over a decade, or ever. Surely they could nuke a couple million hours of MLG montage parodies and craft beer podcasts without anyone noticing?
We all have India Derangement Syndrome at this point, but nonetheless I wonder what would happen if Youtube just nuked all content from India (and to a greater extent, Pakistan). If you just go to the main page and view the most recent uploads, it's just an endless stream of useless shit from pajeets. Not to mention they're the ones who run most of the AI slop channels, as well as the channels who steal other people's content to leech off ad revenue such as unauthorized Shorts channels that upload snippets from other videos.
 
I forgot to bring this up before, but why would someone need to archive a channel like this?


Please note that only the videos that feature the little girls have been archived.

- Edit -

Another channel from a little girl that is being archived. For some reason...

If you want to save yourself from some legal troubles later, look up "night routine" and delete those files. You'll know them when you see them.
 
If you want to save yourself from some legal troubles later, look up "night routine" and delete those files. You'll know them when you see them.
Are you telling me that or are you quoting my post to refer to a third person? I'm not saving or archiving videos of little girls, I'm no sicko, I just found them when visiting the "Latest" page. In fact right now there are at least 3 videos of little girls on that page that someone archived... It's weird as fuck, and I don't think that stuff needs to be saved.
 
Are you telling me that or are you quoting my post to refer to a third person? I'm not saving or archiving videos of little girls, I'm no sicko, I just found them when visiting the "Latest" page. In fact right now there are at least 3 videos of little girls on that page that someone archived... It's weird as fuck, and I don't think that stuff needs to be saved.
How the hell did I screw up this badly? Sorry. I meant for PreserveTube... forget it. Retardation on my part.
 
Did the number of archives go down when you added that warning/popup that shows up when you go to archive a video? I know that it stopped me from archiving normally low value videos that I would've archived for the sake of "Archive everything". Personally, I wouldn't mind if you put up banner ads on the site if it means we can still keep archiving videos.
At one point for a short amount of time there were ads. I turned off my adblock when I noticed it was blocking stuff on the site. Not sure why they got rid of them with a push for donations instead of having both since you know most people are probably not gonna donate. I get ads are annoying but as long as they aren't intrusive and aren't a crazy resource hog I don't care if it's able to help keep the site afloat. (Maybe have a message to urge people to turn off their adblockers?)
 
I'd be interested to see the statistics on how much of that is comprised of videos that haven't been watched in over a decade, or ever. Surely they could nuke a couple million hours of MLG montage parodies and craft beer podcasts without anyone noticing?
Most of it is actually just garbage made by ai accounts in the past 2 years. Youtube has been trying to nuke them off the site and demonetize videos which have ai stuff in them but they're only removing the heavy hitters (over 10k subscribers). That and Indians posting... whatever Indians post... 🤷‍♂️
 
Rather than just a deletions page, have a "quality decrease" page which is much more aggressive about which videos are included, which drastically lowers the quality of the video to take up 5x-10x less space.

If someone is upset about the quality loss, they are free to re-upload the video, which replaces the low quality version with standard quality again and resets the timer on decreasing its quality again for a few months. Or people can email you about things they really care about to get added to an exceptions list which is never reduced in quality.
Perhaps in lieu of permanent reduction in quality, what about tiered storage? Have a normal, faster storage made out of SSDs or whatever, setup for the small bit of videos that are usually or recently watched, and is fast enough to stream at full quality. Then, as videos sit unwatched, they fall out of that smaller, higher speed storage system and get relegated away to a bigger, but slower system made out of traditional hard drives. Then all that remains in the fast storage would be a thumbnail sized downsample of the video at 240p or some other small size that takes up relatively little storage space.

Perhaps as a funding incentive, if people want to pull a video back into fast cache at full quality, they can pay a small fee, $5 or $20 or something.

Tape storage is the absolute cheapest option TB for TB, for example this 18 TB HPE LTO-9 tape costs $92.95 so each TB costs $5.16
Only if you look at the cost of media. A compatible drive costs roughly 100x that at $5600. I think tape is nifty, too, but it remains limited to big players who have store terabytes of information offsite because they're subject to regulations, and the drives are priced accordingly. That price isn't just manufacturers being greedy assholes taking advantage of deep pockets. Tape transports require some fairly high-precision parts that are not trivial or cheap to make. If you've ever had a video tape or multi-track audio tape transport apart, you'd understand.
 
If anyone has any ideas on how this can be worked on, I'd be more than open to them.
My suggestion would be to add a field for "reason for archival" and periodically check the database and prune archives with no good reason. Would save money but cost time.
 
My suggestion would be to add a field for "reason for archival" and periodically check the database and prune archives with no good reason. Would save money but cost time.
If you could crowd source this maybe but just for him alone it wouldn't be feasible. Then you'd be getting into trying to define what is worthwhile in a way that retards can make consistent judgements. That might work but has a lot of pitfalls.

If there is some kind of useful filtering work that could be effectively crowd sourced maybe make it required for viewing a video and server dual use as a captcha? Like the old text transcription ones on 4chan.

Off topic but this made me laugh
1773610867057.png
 
Adding a "reason for archival" would be useful, since it'd separate degenerates wanting to save their porn stash from people wanting to archive the degeneracy of said degenerates.
I wish @PreserveTube would purge all videos featuring little girls. A couple of weeks ago, there were videos of a girl, about 10 years old, practicing gymnastics in a leotard. Why would anyone need to archive that?

Every now and then some sicko shit appears in the latest page. I'm not asking him to monitor that page 24/7, but I wouldn't have any problem reporting it if I saw it.
 
I wish @PreserveTube would purge all videos featuring little girls. A couple of weeks ago, there were videos of a girl, about 10 years old, practicing gymnastics in a leotard. Why would anyone need to archive that?

Every now and then some sicko shit appears in the latest page. I'm not asking him to monitor that page 24/7, but I wouldn't have any problem reporting it if I saw it.
The general consensus of the thread mostly supports filtering videos before they are downloaded, through a written reason, a user account or a block and removal of anything from the third world, as archiving anything and everything with no rhyme or reason is a noble goal but completely unreasonable. A secondary task would be to go back and systematically purge shit that has no recognizable archival value such as the stuff Banana mentioned above.

@PreserveTube has not been online for 3 weeks. Hopefully he is taking a well deserved break and will be willing to consider or discuss some of the ideas put forth in this thread at some time in the future.
The site itself is a brilliant resource for catching that one in a hundred removed YT video from 2nd amendment or lolcow YouTuber that was missed by other page archivers.

At the end of the day, it is his site, so he is freely able to tell us to to go fuck ourselves and archive our own shit, but hopefully he finds a viable solution to these problems so that is not necessary.
 
One of my favorite videos from like, 2007/2008 is just totally gone from the internet. It was simply titled "Bunny Yelling" iirc, and it's exactly what's on the box; a short video of a woman yelling "Bunnyyyy!" to her pet rabbit, and the rabbit makes some cute surprisingly noisy sounds back. Rabbits don't usually vocalize very much so it was kind of a fascinating video, especially because it was just sitting on the person recording the video. When I tried to search it, instead I found the following:
  • Random episodes from TV cartoons that are already available on other piracy sites
  • Clips from cartoon TV shows that are, again, already available on other piracy sites
  • AMV garbage
  • Porn 🤢
  • Some weird racist videos from Chinese Historian that picked up because he used words like "snow bunny" and "rice bunny"
  • Other shitty raceplay videos that were picked up because of keywords like those
  • Furry inflation and "weight gain" EPI shit (:cryblood:)
  • Vtuber crap
  • Other awful gooner content, generally with blurred anime girl tits in the thumbnail
  • YouTube-exclusive kids content that nobody is probably going to watch, probably archived from archiver 'tism that it will be gone later
  • A few actual rabbit videos, but not the one I was looking for
This was all from just searching the word "Bunny". I searched "Bunny Yelling" first and got nothing. I didn't think to archive the video myself years ago because I never would've guessed that YouTube would take it down. I am pretty sure the video is privated by the owner but I'm not confident. There is a chance they privated it either for personal reasons or from crazies assuming the rabbit was abused somehow to make the sounds. The video is old and very pixelated like 240p or 360p.

The site could probably use some kind of content pruning, particularly for TV shows that are definitely going to be saved somewhere else. The creepy EPI furry videos also deserve to be lost forever.
I have a heartwarming update to this. The video was recovered in all its loud, crunchy glory.
Behold,


BUNNYYY!!
Just in time before Easter, too!

Someone made a post about it on the Tip of my Tongue Reddit 10 years ago, and someone else posted a link to this video in reply. However, the person who originally posted the link ended up deleting their post. Unddit revealed the YouTube link they posted. From there, plugging in the YouTube link back into Wayback Machine actually pulled up the video.

The most mysterious part of this video is that the original poster is still largely unknown, and that the link that was given on Reddit of this video mentioned this was a reupload from the original. Again, this video is old as fuck, like 2007 to 2008 YouTube era old. It very likely does not have any higher definition than this crunchy mess, but here it is.

If I had to take a wild guess on why this specific video seems to keep disappearing, it might be because some people speculate that the woman may be "abusing" the rabbit by yelling loudly and hurting its ears enough to yell back at her. This is silly as fuck to me but honestly, if you've seen any pet enthusiasts online, you should know they will blow the most innocuous shit completely out of proportion while sleeping on sincerely disturbing content.

NOW, what happens if I plug the YouTube URL into PreserveTube to find any archive of it....?!
fucking nothinggg.png
......Amazing. Incredible. Thank you, Preservetube......
🐇
 
Sorry for the late replies everyone. End of quarter and exams don't mix too well. I really appreciate everyones support.

Rather than just a deletions page, have a "quality decrease" page which is much more aggressive about which videos are included, which drastically lowers the quality of the video to take up 5x-10x less space.
I looked into this initially, and ruled it not being worth it. Encoding takes a lot of time, and also resources -- trying to downscale a 15min 1080p video to 480p took ~30min.
15 TB in a year with probably accelerating amounts of archiving and nobody watching all the archived videos. If there's people that use preservetube as their google drives
Bingo. Also an ungodly amount of people using it as an alternative to Invidious when the main public instances are offline. I've also seen a weird amount of people archiving their own personal videos.
I don't really see a solution besides shilling more and asking for donations.
Neither do I. Another problem is that most people that watch the archives don't watch them on the site, they watch them via some guy that reuploaded it back onto Youtube, potentially even with adsense enabled. Reaching those users via donation banners is impossible.
There's a lot of YouTube on the Kiwi Farms I would want to keep a copy of though. If you can provide some sort of API to permit me to download videos, trigger archivals via webhooks, and make certain archives as high-importance to peg them from prunes, I'd be interested in all that.
I replied to your email at the time. Whitelisting your IPs from the captchas in the archiving page would probably be enough to allow you to automate the archivals.
Are you running servers yourself or is this all on an S3 Cloud? It would also help to be clear with goals.
It's rented storage servers, so not my own physical hardware. That adds to the costs, of course. At the same, right now isn't the best time to transition to physical servers, considering the prices and all.
Getting one A310 or A380 and having it chug though 4 of your biggest H264 videos would probably buy you a lot of space.
I'm already getting the lowest codec. The default quality is 480p, but it's lowered to 360p if video is longer than 15 minutes.
Something I thought about a few days ago is showing an advertisement while waiting for a video to be downloaded.
Preservetube is too shady for legitimate ad platforms, and too non-shady for shady ad platforms. Banner ads on shady ad platforms, which would only be gambling, pay little to nothing.
What percent of your storage is used by single uploads vs channel uploads? I would bet 90% of more of your storage is caused by people putting in entire channels, while people putting in individual videos are more selective and using a relatively small amount of your space. Consider making entire channel downloads subscriber only.
This is the main reason channel archives aren't full channel archives, they're only the first 5 videos. Surprisingly enough, most people archiving whole channels do it video by video. For example, the one guy that decided it was worth his time to archive an entire channel worth of Brazilian cartoons did it that way.
With pricing like $0.00099 per GB ($0.99 per TB) per month, you could save on monthly costs while long term programmatic solutions are developed (pruning slop, etc)
Though as I think about it more, it might work, assuming people are willing to contribute. Make it obvious the goal is long term archiving, not a proxy, and that unviewed videos will go into the archive and cost money to retrieve. Let people buy credits, and give a "I want this eventually" vs "I want this ASAP", the former bundles requests together into 1TB bulk requests and then emails the people when it's live for a lower price.
By detecting when media on YouTube becomes unavailable and bringing your archived copy back from Near- or Off-line storage to be served up, you minimize the amount of expensive Online storage you need. The trade-off is more site logic needed to detect unavailable content and bring cold copies back from storage to live.
I'll have to actually look into this, ty.
You also pay through the nose to retrieve data. You only looked at the monthly storage rate. Glacier is supposed to be insurance / never need it type deal. Braindead suggestion.
They still make a good point, especially considering the queued downloading for cheaper pricing.
In fact right now there are at least 3 videos of little girls on that page that someone archived... It's weird as fuck, and I don't think that stuff needs to be saved.
Oh god. I blocked saving from Tor IPs ages ago to try and avoid this happening, but people will be creeps from anywhere I guess. I'll have to look into this.
 
Preservetube is too shady for legitimate ad platforms, and too non-shady for shady ad platforms. Banner ads on shady ad platforms, which would only be gambling, pay little to nothing.
So when you had ads running on the site they made no money? Could Rumble's advertising platform work? Playwire? (Just suggestions from a tard.)

I'll have to actually look into this, ty.
Something to consider for this, this video (archive.today) has been removed via a DMCA takedown. Yet on this site it still shows that it's available on YouTube, despite not being playable and showing the DMCA takedown message. Not sure how you plan on detecting if videos have been removed or not, but figured this could be helpful info.
1775158387712.png 1775158441678.png
 
Something to consider for this, this video (archive.today) has been removed via a DMCA takedown. Yet on this site it still shows that it's available on YouTube, despite not being playable and showing the DMCA takedown message. Not sure how you plan on detecting if videos have been removed or not, but figured this could be helpful info.
Disclaimer: I have not looked deeply into this, but there may be a correlation between a "takedown" and "country restriction" (not the same thing, but with similar results), meaning, it's possible that a video that has been taken down like your example, also shows up as country restricted according to YouTube's API.

So now, your video's response for "contentDetails" is:
JSON:
{
  "kind": "youtube#videoListResponse",
  "etag": "lVE0OS09pOtTR0nAFrssp7-pUuk",
  "items": [
    {
      "kind": "youtube#video",
      "etag": "feysyTgZylnq3ecOLybqybrE9gk",
      "id": "oNy7SxHo974",
      "contentDetails": {
        "duration": "PT14M21S",
        "dimension": "2d",
        "definition": "hd",
        "caption": "false",
        "licensedContent": false,
        "regionRestriction": {
          "blocked": [
            "AD",
            "AE",
            "AF",
            "AG",
            "AI",
            "AL",
            "AM",
            "AO",
            "AQ",
            "AR",
            "AS",
            "AT",
            "AU",
            "AW",
            "AX",
            "AZ",
            "BA",
            "BB",
            "BD",
            "BE",
            "BF",
            "BG",
            "BH",
            "BI",
            "BJ",
            "BL",
            "BM",
            "BN",
            "BO",
            "BQ",
            "BR",
            "BS",
            "BT",
            "BV",
            "BW",
            "BY",
            "BZ",
            "CA",
            "CC",
            "CD",
            "CF",
            "CG",
            "CH",
            "CI",
            "CK",
            "CL",
            "CM",
            "CN",
            "CO",
            "CR",
            "CU",
            "CV",
            "CW",
            "CX",
            "CY",
            "CZ",
            "DE",
            "DJ",
            "DK",
            "DM",
            "DO",
            "DZ",
            "EC",
            "EE",
            "EG",
            "EH",
            "ER",
            "ES",
            "ET",
            "FI",
            "FJ",
            "FK",
            "FM",
            "FO",
            "FR",
            "GA",
            "GB",
            "GD",
            "GE",
            "GF",
            "GG",
            "GH",
            "GI",
            "GL",
            "GM",
            "GN",
            "GP",
            "GQ",
            "GR",
            "GS",
            "GT",
            "GU",
            "GW",
            "GY",
            "HK",
            "HM",
            "HN",
            "HR",
            "HT",
            "HU",
            "ID",
            "IE",
            "IL",
            "IM",
            "IN",
            "IO",
            "IQ",
            "IR",
            "IS",
            "IT",
            "JE",
            "JM",
            "JO",
            "JP",
            "KE",
            "KG",
            "KH",
            "KI",
            "KM",
            "KN",
            "KP",
            "KR",
            "KW",
            "KY",
            "KZ",
            "LA",
            "LB",
            "LC",
            "LI",
            "LK",
            "LR",
            "LS",
            "LT",
            "LU",
            "LV",
            "LY",
            "MA",
            "MC",
            "MD",
            "ME",
            "MF",
            "MG",
            "MH",
            "MK",
            "ML",
            "MM",
            "MN",
            "MO",
            "MP",
            "MQ",
            "MR",
            "MS",
            "MT",
            "MU",
            "MV",
            "MW",
            "MX",
            "MY",
            "MZ",
            "NA",
            "NC",
            "NE",
            "NF",
            "NG",
            "NI",
            "NL",
            "NO",
            "NP",
            "NR",
            "NU",
            "NZ",
            "OM",
            "PA",
            "PE",
            "PF",
            "PG",
            "PH",
            "PK",
            "PL",
            "PM",
            "PN",
            "PR",
            "PS",
            "PT",
            "PW",
            "PY",
            "QA",
            "RE",
            "RO",
            "RS",
            "RU",
            "RW",
            "SA",
            "SB",
            "SC",
            "SD",
            "SE",
            "SG",
            "SH",
            "SI",
            "SJ",
            "SK",
            "SL",
            "SM",
            "SN",
            "SO",
            "SR",
            "SS",
            "ST",
            "SV",
            "SX",
            "SY",
            "SZ",
            "TC",
            "TD",
            "TF",
            "TG",
            "TH",
            "TJ",
            "TK",
            "TL",
            "TM",
            "TN",
            "TO",
            "TR",
            "TT",
            "TV",
            "TW",
            "TZ",
            "UA",
            "UG",
            "UM",
            "US",
            "UY",
            "UZ",
            "VA",
            "VC",
            "VE",
            "VG",
            "VI",
            "VN",
            "VU",
            "WF",
            "WS",
            "YE",
            "YT",
            "ZA",
            "ZM",
            "ZW"
          ]
        },
        "contentRating": {},
        "projection": "rectangular"
      }
    }
  ],
  "pageInfo": {
    "totalResults": 1,
    "resultsPerPage": 1
  }
}
Which you can also see in this website (same thing, but 3rd party that also provides visuals):
https://unblockvideos.com/youtube-video-restriction-checker/
restricted.png

If and only if this is the case, where a takedown always means complete country restriction for practical purposes, this could be one method.

However you may see in that map that a slight portion is green, this is because YouTube doesn't handle Kosovo, Somaliland, and N. Cyprus properly (so you see them as available, but in truth they're not, probably has to do something with these countries not being universally considered sovereign or independent (that's the main common denominator between them), so for some reason YT tells you that it's available there).

Or you can simply do, if the video is not available in your country, then serve the video. I'm not very knowledgeable in these things though, but I'm just saying as something that may be useful, you will have to determine if the 1st premise is true: takedown → full country restriction (always).
 
Back
Top Bottom