OSM Town Compass @compass

2 posts2 participants0 posts today

**tj (i make it past the end)** @blackfinalboi@diaspora.im · 3d

tj (i make it past the end) @blackfinalboi@diaspora.im

"This destruction reveals not only the brutality of war, but also a long-standing neglect of the importance of digital archiving and the use of technical advances to preserve documents and archives. This neglect threatens to lose a history that could have been documented for the present and built upon for the future."

https://www.dabangasudan.org/en/all-news/article/as-the-world-celebrates-artificial-intelligence-ai-sudanese-journalism-loses-memory-in-war

Dabanga Radio TV Online · 4dAs the world celebrates Artificial Intelligence, Sudanese journalism loses memory in war - Dabanga Radio TV OnlinePrepared by Al-Alaq Centre for Press Services for Sudan Media Forum As the world moves...

#Sudan #KeepEyesOnSudan #TalkAboutSudan

**Aral Balkan** @aral@mastodon.ar.al · 3d

Aral Balkan @aral@mastodon.ar.al

New Kitten Release

• Implements cascading archives support

https://kitten.small-web.org/reference/#cascading-archives

Cascading archives¹ is useful if you have a static archive of the old version of your site and you don’t want to host it somewhere else and use the 404→307 (evergreen web) technique (https://kitten.small-web.org/reference/#evergreen-web-404-307) (the latter is useful if the old version of your site is a dynamic site and you cannot take a static archive of it).

If a URL cannot be found on your app, Kitten will trying it in the archive folders:

__archive__1
__archive__2
__archive__3

(In that order.)

So you can three older static versions of your site served without breaking older URLs unless they are shadowed by newer URLs in your site/app.

Enjoy!

¹ This is a feature that I originally implemented in Site.js (that’s going to be shut down tomorrow when Let’s Encrypt stops issuing certificates with OCSP-stapling – I don’t have the bandwidth to maintain two servers/frameworks; Kitten is Site.js’s successor). I’m planning on implementing this differently in Kitten going forward (so you can use the Settings interface to upload a zipped archive and it will serve it) but I need this for my own site for tomorrow’s shutdown so we have this simpler implementation in the meanwhile. Leaving things to the last minute? Who? Me? Never! ;)

#Kitten #SmallWeb #SmallTech

**Aral Balkan** @aral@mastodon.ar.al · Apr 28

Apr 28

Aral Balkan @aral@mastodon.ar.al

Ooh, what’s this?… Look Over There!
(With apologies to Jaida Essence Hall)

So the little app I teased earlier is ready and deployed and I have our own instance running at:

https://look-over-there.small-web.org

Look Over There! lets you forward multiple domains to different URLs with full HTTPS support.

Why?

We have a number of older sites that are becoming a chore/expensive to maintain and yet I don’t want to break the web. So I thought, hey, I’ll just use the “url forwarding” feature of my domain registrar to forward them to their archived versions on archive.org.

Ah, not so fast, young cricket… seems some domain registrars’ implementations of this feature do not work if the domain being forwarded is accessed via HTTPS (yes, in 2025).

So, given Kitten¹ uses Auto Encrypt² to automatically provision Let’s Encrypt certificates, I added a domain forwarding feature to it and created Look Over There! as a friendly/simple app that provides a visual interface to it.

To see it in action, hit https://cleanuptheweb.org and you should get forwarded to the archived version of it on archive.org. I’m going to be adding more of our sites to the list in the coming days as part of an effort to reduce my maintenance load and cut down our expenses at Small Technology Foundation.

Since it’s Small Web, this particular instance is just for us. However, you can run your own copy on a VPS (or even a little single-board computer at home, etc.) A link to the source code repository is on the site. Once Domain³ is ready for use (later this year ), setting up your own instance of a Small Web app at your own server will take less than a minute.

I hope this little tool, along with the 404→307 (evergreen web) technique⁴, helps us to nurture an evergreen web and avoid link rot. (And the source code, as little as there is because Kitten does so much for you, is a good resource if you want to learn about Kitten’s new class-based component and page model which I haven’t yet had a chance to properly document.)

Enjoy!

¹ https://kitten.small-web.org
² https://codeberg.org/small-tech/auto-encrypt
³ https://codeberg.org/domain/app
⁴ https://4042307.org

#LookOverThere #Kitten #SmallWeb

**Alexandre Dulaunoy** @a@paperbay.org · Apr 28

Apr 28

Alexandre Dulaunoy @a@paperbay.org

Google Books / HathiTrust / or similar all book scans — $100,000 bounty and other archiving bounties

#books #archiving #library #opensource

https://annas-archive.li/volunteering#bounties

https://software.annas-archive.li/AnnaArchivist/annas-archive/-/issues/234

annas-archive.liVolunteering & Bounties - Anna’s ArchiveThe world’s largest open-source open-data library. Mirrors Sci-Hub, Library Genesis, Z-Library, and more.

**Aral Balkan** @aral@mastodon.ar.al · Apr 17

Apr 17

Aral Balkan @aral@mastodon.ar.al

I’m going through our events page on the Small Technology Foundation web site¹ and porting the entries there to the new version of our web site that I’m building in Kitten² and it’s depressing how many event sites have just disappeared.

Thank goodness for archive.org.

¹ https://small-tech.org
² https://kitten.small-web.org

Small Technology FoundationHomeHello! We’re a tiny and independent two-person not-for-profit based in Ireland. We are building the Small Web. No, it’s not web3, it’s web0.

#links #archiving #web

**Erik Jonker** @ErikJonker@mastodon.social · Apr 17

Apr 17

Erik Jonker @ErikJonker@mastodon.social

Most people don't realize how fast storage on SSD's degrades, the old harddisk was much better.
https://www.tomshardware.com/pc-components/storage/unpowered-ssd-endurance-investigation-finds-severe-data-loss-and-performance-issues-reminds-us-of-the-importance-of-refreshing-backups
#ssd #archiving #storage

Tom's Hardware · Apr 16Unpowered SSD endurance investigation finds severe data loss and performance issuesBy Mark Tyson

**Jonathan Kamens** @jik@federate.social · Apr 16

Apr 16

Jonathan Kamens @jik@federate.social

This thread is going to be a bit of a whiny rant, so if you're not up for that kind of thing, best skip it.
A couple months ago a call went out on the interwebz from a researcher worried that the government data his research depended on, many terabytes, might be taken down. He asked people to help him download it as quickly as possible, after which he said he would set up a new centralized location for them to upload it to for permanent storage.
#SafeguardingResearch #archiving
1/?

**Preston Maness ☭** @aspensmonster@tenforward.social · Mar 27

Mar 27

Preston Maness ☭ @aspensmonster@tenforward.social

I've mirrored a relatively simple website (redsails.org; it's mostly text, some images) for posterity via #wget. However, I also wanted to grab snapshots of any outlinks (of which there are many, as citations/references). By default, I couldn't figure out a configuration where wget would do that out of the box, without endlessly, recursively spidering the whole internet. I ended up making a kind-of poor man's #ArchiveBox instead:

for i in $(cat others.txt) ; do dirname=$(echo "$i" | sha256sum | cut -d' ' -f 1) ; mkdir -p $dirname ; wget --span-hosts --page-requisites --convert-links --backup-converted --adjust-extension --tries=5 --warc-file="$dirname/$dirname" --execute robots=off --wait 1 --waitretry 5 --timeout 60 -o "$dirname/wget-$dirname.log" --directory-prefix="$dirname/" $i ; done

Basically, there's a list of bookmarks^W URLs in others.txt that I grabbed from the initial mirror of the website with some #grep foo. I want to do as good of a mirror/snapshot of each specific URL as I can, without spidering/mirroring endlessly all over. So, I hash the URL, and kick off a specific wget job for it that will span hosts, but only for the purposes of making the specific URL as usable locally/offline as possible. I know from experience that this isn't perfect. But... it'll be good enough for my purposes. I'm also stashing a WARC file. Probably a bit overkill, but I figure it might be nice to have.

#RedSails #archive #archival

Continued thread

**dasgrueneblatt** @dasgrueneblatt@wien.rocks · Mar 24

Mar 24

dasgrueneblatt @dasgrueneblatt@wien.rocks

The ArchiveTeam Warrior has been running intermittently on my laptop for ten days now.

It downloads stuff and puts it into the Internet Archive.

Everything's fine. It only runs while I use the laptop. I don't notice it. When the laptop goes into standby and wakes up again that doesn't seem to have any adverse effects.

I've downloaded and uploaded gigabytes so far. The top of the leader board for this project is half a petabyte.

Now I'm considering a installation where it could run around the clock. I don't want to increase our household's standby energy consumption too much, so I will see how that goes.

@internetarchive

#archiving
#internetArchive
#DataRescue
#dataPreservation
#digtitalPreservation
#archiveTeamWarrior
#archiveTeam

**Letterform Archive** @letterformarchive@typo.social · Mar 22 *

Mar 22 *

Letterform Archive @letterformarchive@typo.social

Salon Series 54: Palestine in Print: Publishing and Archiving as Tools for Resistance
Ren Allathkani, MJ Fair, Nicole Kaack, and Kate Laster
Thu, Apr 3, 2025 at 6:00pm–7:30pm PT

Jewish and Palestinian artists and archivists examine the critical role of print media in protecting and archiving endangered voices, histories, and traditions.⁠

Register to attend online or in person: https://letterformarchive.org/shop/salon-series-54-palestine-in-print-publishing-and-archiving-as-tools-for-resistance/?utm_source=Mastodon

(Images: Kate Laster)

Square image with a white stenciled text on a black background:
COLLECTIVE LIBERATION
COLLECTIVE ACTION
COLLECTIVE MOVEMENT
COLLECTIVE CARE

Square image of a black stencil with white cutout text:
CA JEWISH ARTISTS FOR PALESTINE

#Palestine #Archives #Archiving

**Henrik Schönemann** @lavaeolus@fedihum.org · Mar 22 *

Mar 22 *

Henrik Schönemann @lavaeolus@fedihum.org

Question re #SafeguardingResearch
We encounter 'web applications' that our current method of archiving don't preserve.

Things like [we need a better example, this one is already gone (but the data preserved) https://social.coop/@edsu/114206452552797815]

We are mostly using https://github.com/openzim/zimit to create WARC files and combining them into a single ZIM.
(This uses the browsertrix crawler)

Any ideas on how to archive not just the content, but also the functionality of such applications?
#DigiPres #Web #Archiving

social.coopEd Summers (@edsu@social.coop)@anj@digipres.club @lavaeolus@fedihum.org @ww@tldr.nettime.org the dashboard API calls are going to URLs like: https://awsgov-api.imls.gov/api/column/0/2024/null/county?_=1742650522892 Which seem to return no results. It's interesting that even though the dashboard doesn't seem to work in the browser you can find lots of the JSON responses in the Wayback Machine which are there due to some SavePageNow & GPO Archive-It crawling? https://web.archive.org/web/*/https://awsgov-api.imls.gov/api/* In fact some of the links still seem to work, which suggests that the dashboard interface is broken (intentionally or by accident?)

**dasgrueneblatt** @dasgrueneblatt@wien.rocks · Mar 14 *

Mar 14 *

dasgrueneblatt @dasgrueneblatt@wien.rocks

Installed and started the ArchiveTeam Warrior. Very smooth experience.

It downloads stuff and puts it into the Internet Archive.

I took the "ArchiveTeam’s Choice" project and it chose public telegram channels. It's not taking a lot of bandwidth or memory or space or computing, as far as I can tell. It might take too much of my time and focus if I continue staring at the dashboard to try and figure out what all that stuff is.

http://warrior.archiveteam.org/

@internetarchive

warrior.archiveteam.orgArchiveTeam Warrior

#archiving #internetArchive #DataRescue

**Koen Hufkens, PhD** @koen_hufkens@mastodon.social · Mar 12 *

Mar 12 *

Koen Hufkens, PhD @koen_hufkens@mastodon.social

If someone needs to backup all USDA SNOTEL data (as climate adjacent) and therefore "reasons", here is gist using my {snotelr} #rstats package.

https://gist.github.com/khufkens/b13b28068ddfd8a20d130abe2c95d49a

GistBackup USDA SNOTELBackup USDA SNOTEL. GitHub Gist: instantly share code, notes, and snippets.

#USDA #opendata #openscience

**Jennifer!** @yellow@twoot.site · Mar 8 *

Mar 8 *

Jennifer! @yellow@twoot.site

does someone have a file server hosted at home (or privately enough that it is YOURS) and wants to archive all Linus Tech Tips Floatplane Exclusives??? (in a way I could still download/access them if I want to)

I really need some storage space :'D and I want to archive this forever if possible

it's 154GB of 1080p30fps videos, all the Floatplane exclusives from when they started being a thing, until 20th February (I really could only afford one month so some recent vids are already missing)

(please boost for maximum reach )

#tech #fileServer #hosting

Replied in thread

**Michael T Babcock** @mikebabcock@floss.social · Mar 6

Mar 6

Michael T Babcock @mikebabcock@floss.social

@gutenberg_org @internetarchive is anyone doing anything about the idea of including the codec with the scan for future future generations? I love digital archiving but at the same time there are already videos in my own archive that I can no longer easily decode.
#software #codec #transcoding #archiving #archival #digitalArchaeology

**Jake in the desert** @jake4480@c.im · Mar 1

Mar 1

Jake in the desert @jake4480@c.im

As a fellow digital packrat, I relate strongly to this Digital Packrat Manifesto.

https://archive.is/Ks4Ee

A graphic of manila folders as resistors on a motherboard

#archiving #DigitalPackrats #NoDRM

**the magnificent rhys** @rhys@rhys.wtf · Feb 26

Feb 26

the magnificent rhys @rhys@rhys.wtf

Just discovered ArchiveBox — FOSS, self-hosted internet archiving.

The way the web is going, with the US government redacting and outright erasing historic content, publishers segmenting content by region (and also sometimes redacting/censoring it), and CloudFlare shitting all over everything, I think it's time for me to start my #archiving and #DataHoarding journey.

#SelfHosting #SelfHosted #DataHoarder

https://github.com/ArchiveBox/ArchiveBox

GitHubGitHub - ArchiveBox/ArchiveBox: 🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... - ArchiveBox/ArchiveBox

**hyperreal** @hyperreal@tilde.zone · Feb 25 *

Feb 25 *

hyperreal @hyperreal@tilde.zone

- I've setup a VNET thick jail on my FreeBSD NAS.
- The jail has its own IP address on my LAN.
- I declared a devfs ruleset to unhide /dev/tun* for the VNET jail.
- I installed Wireguard in the jail.
- I enabled Wireguard with a ProtonVPN configuration.
- I installed qbittorrent-nox and configured it to use the Wireguard interface.

I now have a home ISP-proof qBittorrent setup with which to torrent Anna's Archive.

Hopefully there is no way that my ISP can get in, otherwise I'll get legal scare letters that threaten to put me in a jail myself.

Honestly I feel like this was more straightforward to do than with LXC containers.

#FreeBSD #wireguard #datahoarding

**Paula Gordon** @dbaplanb@mastodon.sdf.org · Feb 22

Feb 22

Paula Gordon @dbaplanb@mastodon.sdf.org

#AskFedi
How to archive Google Group messages? I want to archive #BosNet messages (which were copied into GG), retaining the thread structure, if possible, but if not, an enormous PDF would be better than nothing.
HOW? Can I plug the group page into the WayBack machine, or would that just save the one page? Would ArchiveWeb.page do this?
Also found the direct link to bit.listserv.bosnet and many others, but can't open the link w/o a reader.
#archiving #ArchiveWeb #UseNet #Bosnia #BalkansNet

**rednikki** @rednikki@toot.boston · Feb 22

Feb 22

rednikki @rednikki@toot.boston

Archiving Gmail - help?
I want to download all of my email from the beginning of my account until 2023 and then wipe it off the Gmail servers. I am not sure how to do this, because every solution I have found so far will synchronize my storage solution to my Gmail acocunt.

With Thunderbird I can go POP to delete all new incoming messages, but that is kind of the opposite of what I am looking for.

Ideas?
#email #privacy #archiving

Recent searches

Search options

Administered by:

Server stats:

#archiving