Have you considered karakeep (formerly hoarder)? It does all of this really well - drop it a URL and it saves a copy. Has lists & tagging (can be done by AI if you want), IOS & android apps as well as browser extensions that make saving stuff super easy.
- 2 Posts
- 88 Comments
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English1·1 month agoBroadly similar from a quick glance: https://www.amazon.pl/s?k=m-disc+blu+ray
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English1·1 month agoMy options look like this:
https://allegro.pl/kategoria/nosniki-blu-ray-257291?m-disc=tak
Exchange rate is 3.76 PLN to 1 USD, which is actually the best I’ve seen in years
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English1·1 month agoI only looked how zfs tracks checksums because of your suggestion! Hashing 2TB will take a minute, would be nice to avoid.
Nushell is neat, I’m using it as my login shell. Good for this kind of data-wrangling but also a pre-1.0 moving target.
traches@sh.itjust.worksto Selfhosted@lemmy.world•Self-Hosted podcast has announced that episode 150 is their last.English17·1 month agoTailscale deserves it, bitcoin absolutely does not
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English2·1 month agoWhere I live (not the US) I’m seeing closer to $240 per TB for M-disc. My whole archive is just a bit over 2TB, though I’m also including exported jpgs in case I can’t get a working copy of darktable that can render my edits. It’s set to save xmp sidecars on edit so I don’t bother with backing up the database.
I mostly wanted a tool to divide up the images into disk-sized chunks, and to automatically track changes to existing files, such as sidecar edits or new photos. I’m now seeing I can do both of those and still get files directly on the disk, so that’s what I’ll be doing.
I’d be careful with using SSDs for long term, offline storage. I hear they lose data if not powered for a long time. IMO metadata is small enough to just save a new copy when it changes
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English1·1 month agoI’ve been thinking through how I’d write this. With so many files it’s probably worth using sqlite, and then I can match them up by joining on the hash. Deletions and new files can be found with different join conditions. I found a tool called ‘hashdeep’ that can checksum everything, though for incremental runs I’ll probably skip hashing if the size, times, and filename haven’t changed. I’m thinking nushell for the plumbing? It runs everywhere, though they have breaking changes frequently. Maybe rust?
ZFS checksums are done at the block level, and after compression and encryption. I don’t think they’re meant for this purpose.
traches@sh.itjust.worksto Selfhosted@lemmy.world•Self-Hosted podcast has announced that episode 150 is their last.English463·1 month agoAww, man, I’m conflicted here. On one hand, I’ve enjoyed their work for years and they seem like good dudes who deserve to eat. On the other, they’re AI enthusiast crypto-bros and that’s just fucking exhausting. I deal with enough of that bullshit at work
Edit: rephrase for clarity
traches@sh.itjust.worksto Selfhosted@lemmy.world•Is selfhosting your Girlfriend a good idea? 😂English31·1 month agohumans are neat
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English1·1 month agoYeah, you’re probably right. I already bought all the stuff, though. This project is halfway vibes based; something about spinning rust just feels fragile you know?
I’m definitely moving away from the complex archive split & merge solution.
fpart
can make lists of files that add up to a given size, andfd
can find files modified since a given date. Little bit of plumbing and I’ve got incremental backups that show up as plain files & folders on a disk.
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English3·1 month agoOhhh boy, after so many people are suggesting I do simple files directly on the disks I went back and rethought some things. I think I’m landing on a solution that does everything and doesn’t require me to manually manage all these files:
fd
(and any number of other programs) can produce lists of files that have been modified since a given date.- fpart can produce lists of files that add up to a given size.
xorrisofs
can accept lists of files to add to an iso
So if I
fd
a list of new files (or don’t for the first backup), pipe them intofpart
to chunk them up, and then pass these lists intoxorrisofs
to create ISOs, I’ve solved almost every problem.- The disks have plain files and folders on them, no special software is needed to read them. My wife could connect a drive, pop the disk in, and the photos would be right there organized by folder.
- Incremental updates can be accomplished by keeping track of whenever the last backup was.
- The fpart lists are also a greppable index; I can use them to find particular files easily.
- Corruption only affects that particular file, not the whole archive.
- A full restore can be accomplished with rsync or other basic tools.
Downsides:
- Change detection is naive. Just mtime. Good enough?
- Renames will still produce new copies. Solution: don’t rename files. They’re organized well enough, stop messing with it.
- Deletions will be disregarded. I could solve this with some sort of indexing scheme, but I don’t think I care enough to bother.
- There isn’t much rhyme or reason to how fpart splits up files. The first backup will be a bit chaotic. I don’t think I really care.
- If I
rsync -a
some files into the dataset, which have mtimes older than the last backup, they won’t get slurped up in the next one. Can be solved by checking that all files are already in the existing fpart indices, or by just not doing that.
Honestly those downsides look quite tolerable given the benefits. Is there some software that will produce and track a checksum database?
Off to do some testing to make sure these things work like I think they do!
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English2·1 month agoYeah, I already use restic which is extremely similar and I don’t believe it could do this either. Both are awesome projects though
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English1·1 month agoHey cool, I hadn’t heard of bacula! Looks like a really robust project. I did look into tape storage, but I can’t find a tape drive for a reasonable price that doesn’t have a high jank factor (internal, 5.25" drives with weird enterprise connectors and such).
I’m digging through their docs and I can’t find anything about optical media, except for a page in the manual for an old version saying not to use it. Am I missing something? It seems heavly geared towards tapes.
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English31·1 month agoCan borg back up to write-once optical media spread over multiple disks? I’m looking through their docs and I can’t find anything like that. I see an append-only mode but that seems more focused on preventing hacked clients from corrupting data on a server.
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English2·1 month agoI’m using standard BD-DLs. M-Disks are almost triple the price, and this project is already too costly. I’m not looking for centuries of longevity, I’m using optical media because it’s read-only once written. I read that properly stored Blu-Rays should be good for 10 or 20 years, which is good enough for me. I’ll make another copy when the read errors start getting bad.
Copying files directly would work, but my library is real big and that sounds tedious. I have photos going back to the 80s and curating, tagging, and editing them is an ongoing job. (This data is saved in XMP sidecars alongside the original photos). I also won’t be encrypting or compressing them for the same reasons you mentioned.
For me, the benefit of the archive tool is to automatically split it up into disk-sized chunks. That and to automatically detect changes and save a new version; your first key doesn’t hold true for this dataset. You’re right though, I’m sacrificing accessibility for the rest of the family. I’m hoping to address this with thorough documentation and static binaries on every disk.
traches@sh.itjust.worksOPto Selfhosted@lemmy.world•Incremental backups to optical media: tar, dar, or something else?English2·1 month agoWoah, that’s cool! I didn’t know you just
zfs send
anywhere. I suppose I’d have to split it up manually withsplit
or something to get 50gb chunks?Dar has
dar_manager
which you can use to create a database of snapshots and slices that you can use to locate individual files, but honestly if I’m using this backup it’ll almost certainly be a full restore after some cataclysm. If I just want a few files I’ll use one of my other, always-online backups.Edit: Clicked save before I was finished
I’m more concerned with robustness than efficiency. Dar will warn you about corruption, which should only affect that particular file and not the whole archive. Tar will allow you to read past errors so the whole archive won’t be ruined, but I’m not sure how bad the affects would be. I’m really not a fan of a solution that needs every part of every disk to be read perfectly.
I could chunk them up manually, but we’re talking about 2TB of lumpy data, spread across hundreds of thousands of files. I’ll definitely need some sort of tooling to track changes, I’m not doing that manually and I bounce around the photo library changing metadata all the time.
traches@sh.itjust.worksto Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ@lemmy.dbzer0.com•Books, specifically Discworld?English4·2 months agoLol my friend may have seeded it to your friend
traches@sh.itjust.worksto Piracy: ꜱᴀɪʟ ᴛʜᴇ ʜɪɢʜ ꜱᴇᴀꜱ@lemmy.dbzer0.com•Books, specifically Discworld?English16·2 months agoMyanonamouse
traches@sh.itjust.worksto Selfhosted@lemmy.world•Would you return a hard drive with 1 uncorrectable error after 130 hours of work?English1·2 months agoThe part I’m calling out as untrue is the „magic 8 ball” comment, because it directly contradicts my own personal lived experience. Yes it’s a lying, noisy, plagiarism machine, but its accuracy for certain kinds of questions is better than a coin flip and the wrong answers can be useful as well.
Some recent examples
- I had it write an excel formula that I didn’t know how to write, but could sanity check and test.
- Worked through some simple, testable questions about setting up project references in a typescript project
- I want to implement URL previews in a web project but I didn’t know what the standard for that is called. Every web search I could think of related to „url previews” is full of SEO garbage I don’t care about, but ChatGPT immediately gave me the correct answer (Open Graph meta tags), easily verified by searching for that and reading the public documentation.
- Naming things is a famously hard problem in programming and LLMs are pretty good at „what’s another way to say” and „what’s it called when” type questions.
Just because you don’t have the problems that LLMs solve doesn’t mean that nobody else does. And also, dude, don’t scold people on the internet. The fediverse has a reputation and it’s not entirely a good one.
NAS at the parents’ house. Restic nightly job, with some plumbing scripts to automate it sensibly.