How do you keep dats from taking up too much space?

If you have a dat version of wikipedia, every time you edit the website, you’re creating a duplicate of the entire website, except with a new version and some small changes?

If you make 1000 changes, that’s 1000 versions of the website, right? That’s a lot of storage space used up for just one website :thinking:

You only save new copies of the files that change from version to version. Dat saves entire copies of the files and not just the changes (which is how e.g. git does it.) Changing the copyright footer on a website would generate new copies of all the pages and result in the Dat archive doubling in size.

There isn’t currently a solution to this in Dat.
There’s no compression or data deduplication to mitigate the effects either.

I’ve tried maintaining a Dat version of my own websites, but stopped because the archives consumed an inordinate amount of disk space. More on this.

Users normally only download the parts of a Dat archive that their browser has requested (sparse mode) and don’t need to fetch all the previous revisions of files. However there’s currently a bug with the Windows implementation that cause Dat archives to take up the full size of the entire archive on disk even though it only contains a few kilobytes of data.

Right. This is what I was afraid of.

Now I’m thinking about whether versioning is really all that important after all…

Anyone have thoughts on how to solve this conundrum? Can versioning be optional with Dat?

See the discussion in this thread:

1 Like

One thing you can do is set latest: true on your Dat if you’re using the API and it’ll discard any old changes and save you some storage space. The CLI does this by default.