en.osm.town is one of the many independent Mastodon servers you can use to participate in the fediverse.
An independent, community of OpenStreetMap people on the Fediverse/Mastodon. Funding graciously provided by the OpenStreetMap Foundation.

Server stats:

261
active users

#archiving

2 posts2 participants0 posts today

"This destruction reveals not only the brutality of war, but also a long-standing neglect of the importance of digital archiving and the use of technical advances to preserve documents and archives. This neglect threatens to lose a history that could have been documented for the present and built upon for the future."

dabangasudan.org/en/all-news/a

Dabanga Radio TV Online · As the world celebrates Artificial Intelligence, Sudanese journalism loses memory in war - Dabanga Radio TV OnlinePrepared by Al-Alaq Centre for Press Services for Sudan Media Forum As the world moves...

New Kitten Release 🥳

• Implements cascading archives support

kitten.small-web.org/reference

Cascading archives¹ is useful if you have a static archive of the old version of your site and you don’t want to host it somewhere else and use the 404→307 (evergreen web) technique (kitten.small-web.org/reference) (the latter is useful if the old version of your site is a dynamic site and you cannot take a static archive of it).

If a URL cannot be found on your app, Kitten will trying it in the archive folders:

__archive__1
__archive__2
__archive__3

(In that order.)

So you can three older static versions of your site served without breaking older URLs unless they are shadowed by newer URLs in your site/app.

Enjoy!

:kitten:💕

¹ This is a feature that I originally implemented in Site.js (that’s going to be shut down tomorrow when Let’s Encrypt stops issuing certificates with OCSP-stapling – I don’t have the bandwidth to maintain two servers/frameworks; Kitten is Site.js’s successor). I’m planning on implementing this differently in Kitten going forward (so you can use the Settings interface to upload a zipped archive and it will serve it) but I need this for my own site for tomorrow’s shutdown so we have this simpler implementation in the meanwhile. Leaving things to the last minute? Who? Me? Never! ;)

Ooh, what’s this?… Look Over There!
(With apologies to Jaida Essence Hall)

So the little app I teased earlier is ready and deployed and I have our own instance running at:

look-over-there.small-web.org

Look Over There! lets you forward multiple domains to different URLs with full HTTPS support.

Why?

We have a number of older sites that are becoming a chore/expensive to maintain and yet I don’t want to break the web. So I thought, hey, I’ll just use the “url forwarding” feature of my domain registrar to forward them to their archived versions on archive.org.

Ah, not so fast, young cricket… seems some domain registrars’ implementations of this feature do not work if the domain being forwarded is accessed via HTTPS (yes, in 2025).

So, given Kitten¹ uses Auto Encrypt² to automatically provision Let’s Encrypt certificates, I added a domain forwarding feature to it and created Look Over There! as a friendly/simple app that provides a visual interface to it.

To see it in action, hit cleanuptheweb.org and you should get forwarded to the archived version of it on archive.org. I’m going to be adding more of our sites to the list in the coming days as part of an effort to reduce my maintenance load and cut down our expenses at Small Technology Foundation.

Since it’s Small Web, this particular instance is just for us. However, you can run your own copy on a VPS (or even a little single-board computer at home, etc.) A link to the source code repository is on the site. Once Domain³ is ready for use (later this year 🤞), setting up your own instance of a Small Web app at your own server will take less than a minute.

I hope this little tool, along with the 404→307 (evergreen web) technique⁴, helps us to nurture an evergreen web and avoid link rot. (And the source code, as little as there is because Kitten does so much for you, is a good resource if you want to learn about Kitten’s new class-based component and page model which I haven’t yet had a chance to properly document.)

Enjoy!

:kitten:💕

¹ kitten.small-web.org
² codeberg.org/small-tech/auto-e
³ codeberg.org/domain/app
4042307.org

I’m going through our events page on the Small Technology Foundation web site¹ and porting the entries there to the new version of our web site that I’m building in Kitten² and it’s depressing how many event sites have just disappeared.

Thank goodness for archive.org.

¹ small-tech.org
² kitten.small-web.org

Small Technology FoundationHomeHello! We’re a tiny and independent two-person not-for-profit based in Ireland. We are building the Small Web. No, it’s not web3, it’s web0.

This thread is going to be a bit of a whiny rant, so if you're not up for that kind of thing, best skip it.
A couple months ago a call went out on the interwebz from a researcher worried that the government data his research depended on, many terabytes, might be taken down. He asked people to help him download it as quickly as possible, after which he said he would set up a new centralized location for them to upload it to for permanent storage.
#SafeguardingResearch #archiving
🧵1/?

I've mirrored a relatively simple website (redsails.org; it's mostly text, some images) for posterity via #wget. However, I also wanted to grab snapshots of any outlinks (of which there are many, as citations/references). By default, I couldn't figure out a configuration where wget would do that out of the box, without endlessly, recursively spidering the whole internet. I ended up making a kind-of poor man's #ArchiveBox instead:

for i in $(cat others.txt) ; do dirname=$(echo "$i" | sha256sum | cut -d' ' -f 1) ; mkdir -p $dirname ; wget --span-hosts --page-requisites --convert-links --backup-converted --adjust-extension --tries=5 --warc-file="$dirname/$dirname" --execute robots=off --wait 1 --waitretry 5 --timeout 60 -o "$dirname/wget-$dirname.log" --directory-prefix="$dirname/" $i ; done

Basically, there's a list of bookmarks^W URLs in others.txt that I grabbed from the initial mirror of the website with some #grep foo. I want to do as good of a mirror/snapshot of each specific URL as I can, without spidering/mirroring endlessly all over. So, I hash the URL, and kick off a specific wget job for it that will span hosts, but only for the purposes of making the specific URL as usable locally/offline as possible. I know from experience that this isn't perfect. But... it'll be good enough for my purposes. I'm also stashing a WARC file. Probably a bit overkill, but I figure it might be nice to have.

Continued thread

The ArchiveTeam Warrior has been running intermittently on my laptop for ten days now.

It downloads stuff and puts it into the Internet Archive.

Everything's fine. It only runs while I use the laptop. I don't notice it. When the laptop goes into standby and wakes up again that doesn't seem to have any adverse effects.

I've downloaded and uploaded gigabytes so far. The top of the leader board for this project is half a petabyte.

Now I'm considering a installation where it could run around the clock. I don't want to increase our household's standby energy consumption too much, so I will see how that goes.

@internetarchive

#archiving
#internetArchive
#DataRescue
#dataPreservation
#digtitalPreservation
#archiveTeamWarrior
#archiveTeam

Salon Series 54: Palestine in Print: Publishing and Archiving as Tools for Resistance
Ren Allathkani, MJ Fair, Nicole Kaack, and Kate Laster
Thu, Apr 3, 2025 at 6:00pm–7:30pm PT

Jewish and Palestinian artists and archivists examine the critical role of print media in protecting and archiving endangered voices, histories, and traditions.⁠

Register to attend online or in person: letterformarchive.org/shop/sal

(Images: Kate Laster)

Installed and started the ArchiveTeam Warrior. Very smooth experience.

It downloads stuff and puts it into the Internet Archive.

I took the "ArchiveTeam’s Choice" project and it chose public telegram channels. It's not taking a lot of bandwidth or memory or space or computing, as far as I can tell. It might take too much of my time and focus if I continue staring at the dashboard to try and figure out what all that stuff is.

warrior.archiveteam.org/

@internetarchive

warrior.archiveteam.orgArchiveTeam Warrior

does someone have a file server hosted at home (or privately enough that it is YOURS) and wants to archive all Linus Tech Tips Floatplane Exclusives??? (in a way I could still download/access them if I want to)

I really need some storage space :'D and I want to archive this forever if possible

it's 154GB of 1080p30fps videos, all the Floatplane exclusives from when they started being a thing, until 20th February (I really could only afford one month so some recent vids are already missing)

(please boost for maximum reach 💛)

Just discovered ArchiveBox — FOSS, self-hosted internet archiving.

The way the web is going, with the US government redacting and outright erasing historic content, publishers segmenting content by region (and also sometimes redacting/censoring it), and CloudFlare shitting all over everything, I think it's time for me to start my #archiving and #DataHoarding journey.

#SelfHosting #SelfHosted #DataHoarder

github.com/ArchiveBox/ArchiveB

GitHubGitHub - ArchiveBox/ArchiveBox: 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... - ArchiveBox/ArchiveBox

- I've setup a VNET thick jail on my FreeBSD NAS.
- The jail has its own IP address on my LAN.
- I declared a devfs ruleset to unhide /dev/tun* for the VNET jail.
- I installed Wireguard in the jail.
- I enabled Wireguard with a ProtonVPN configuration.
- I installed qbittorrent-nox and configured it to use the Wireguard interface.

I now have a home ISP-proof qBittorrent setup with which to torrent Anna's Archive.

Hopefully there is no way that my ISP can get in, otherwise I'll get legal scare letters that threaten to put me in a jail myself.

Honestly I feel like this was more straightforward to do than with LXC containers.

#AskFedi
How to archive Google Group messages? I want to archive #BosNet messages (which were copied into GG), retaining the thread structure, if possible, but if not, an enormous PDF would be better than nothing.
HOW? Can I plug the group page into the WayBack machine, or would that just save the one page? Would ArchiveWeb.page do this?
Also found the direct link to bit.listserv.bosnet and many others, but can't open the link w/o a reader.
#archiving #ArchiveWeb #UseNet #Bosnia #BalkansNet

Archiving Gmail - help?
I want to download all of my email from the beginning of my account until 2023 and then wipe it off the Gmail servers. I am not sure how to do this, because every solution I have found so far will synchronize my storage solution to my Gmail acocunt.

With Thunderbird I can go POP to delete all new incoming messages, but that is kind of the opposite of what I am looking for.

Ideas?
#email #privacy #archiving