The problem with ZFS snapshots

NAS · December 3, 2024

Better make sure you only restore if you know your data exists in that snapshot otherwise proof it's gone.

Unlike traditional approaches that have incremental vs. differential backups you with ZFS can't restore snapshots anywhere on the timeline arbitrarily.

Consider the following scenario.

1. You're on vacation

2. Your home network experiences ransom malware. Hypothetically you're not quite sure what day it happened.

3. You restore in the beginning the week and it works as expected but wait all your photos photos from vacation are not there! The photos are backed up near at the end of the week just before ransom malware got to your network,

4. Oh no all your snapshots are now invalid (because of first restore) and you can't restore the data from later in the week.

Your vacations pictures are now lost forever! Does this seem confusing?

How is hexos going to help protect users from this kind of scenario?

Replication doesn't fully solve this issue either if the user doesn't know exactly what snapshot their data exists.

Magnus · December 3, 2024

Hi there! While I don't have an answer for you now, this is good point we need to consider when developing snapshots, thank you!

joshdev · December 3, 2024

Hey,

the issue you are describing is not really present. Take a look at this explanation, all the information you are asking for is already present and accessible in the filesystem. Making it accessible to the user in a nice way is the only challenge here. Could be implemented in a similar way to Apple‘s time machine historic view. If you have more questions feel free to ask!

Hope this helps a little 🙂

NAS · December 7, 2024

On 12/3/2024 at 3:54 PM, joshdev said:

Hey,

the issue you are describing is not really present. Take a look at this explanation, all the information you are asking for is already present and accessible in the filesystem. Making it accessible to the user in a nice way is the only challenge here. Could be implemented in a similar way to Apple‘s time machine historic view. If you have more questions feel free to ask!

Hope this helps a little 🙂

Help me understand that the issue isn't really present. I get that you could implement a file browser, but I'm not sure how that solves the issue. I'm not familiar with Apple's time machine so I can't draw on that as a experience.

So, there's 3 issues here at play.

1. Users don't expect a restore to invalidate backups. (Am I wrong that the user can accidentally invalidate snapshots?)

2. The average user It's thinking about a time and date to restore but at the same time wondering what data might be lost. A file system would help but, it would be less effective for large amounts of data or applications.

3. When restoring applications, a user is going to rely on a time and date that it worked. That may involve guesswork on their part which could lead to backup invalidation of snapshots.

Expanding on 3. This may be especially true if it's a multi-user system that are utilizing apps as data that's important might be saved without the nas The owners knowledge. Consider the following scenario.

1. Plex becomes corrupt. So, the NAS owner does restore back to when they think it might be working.

2. Uncle Bob has his own DVD collection. Bob says now a quarter of my DVD collection is gone.

3. The NAS owner wants to restore a later Snapshot, then when they restored initially.

I can see how a file system that's transparent to the end user could be really helpful. A file explorer that can also extract files out of a snapshot. For example, say if you have to restore earlier, but an important file was saved in later snapshot, that you can rescue that file.

I guess at the end of the day, it's dangerous to experiment to restore to different points on a timeline without putting your data at risk. You can go back incrementally, but you can't go back and then forward.

Edited December 7, 2024 by NAS

NAS · December 7, 2024

Well beyond Hexos, In an ideal world ZFS would allow you to transverse forward or backward restoring anywhere on the timeline of snapshots without invalidating snapshots (think git). Snapshots are simply a reference to the data. If the user could experiment by actually restoring. Then after multiple restores making sure they're comfortable with the situation the user could commit to that snapshot, only then discarding all irrelevant data, that would be such a safer mechanism.

NAS · December 7, 2024

It's also important to note that files, in the traditional sense, aren't the only thing that can be lost of value. Say if you finished tagging 1,000 photos in your collection through Immich. That's metadata in a database, which is not reflected through a file explorer.

joshdev · December 7, 2024

10 hours ago, NAS said:

Well beyond Hexos, In an ideal world ZFS would allow you to transverse forward or backward restoring anywhere on the timeline of snapshots without invalidating snapshots (think git). Snapshots are simply a reference to the data. If the user could experiment by actually restoring. Then after multiple restores making sure they're comfortable with the situation the user could commit to that snapshot, only then discarding all irrelevant data, that would be such a safer mechanism.

The thing is, ZFS allows just that!

If you would have read the info I linked to in my previous post you would already know that you can access every snapshot individually because it is browsable. Hence you can find out to which point you want to restore the whole dataset as well.

Additionally you can simply clone a snapshot state to a new dataset so the old dataset with all snapshots is preserved. You could then rename the original to [something]-old and rename the new one to the original name. All the functionality you require is there in ZFS. That is the point I am trying to make. It is all a question of breaking that out to the user.

I know this is hard, but please take your time read the stuff others reply with. It is much easier to discuss about desired implementations then and the result is much more likely to reflect what you actually want.
I took the time to write all this because I think at it‘s core your request is valid and a good feature!

Edited December 7, 2024 by joshdev

NAS · December 14, 2024

Appreciate your time to respond and understand I did read your initially before posting. I tried to express these concerns based on how an average user would understand them but that's not helpful now. My overall confusion and frustration with ZFS is why I became HexOS customer. I didn't feel I could keep my data and especially applications safe with truenas when trying to juggle replication, snapshots and clones.

```

The zfs rollback command can be used to discard all changes made since a specific snapshot. The file system reverts to its state at the time the snapshot was taken. By default, the command cannot roll back to a snapshot other than the most recent snapshot.

To roll back to an earlier snapshot, all intermediate snapshots must be destroyed. You can destroy earlier snapshots by specifying the -r option.

If clones of any intermediate snapshots exist, the -R option must be specified to destroy the clones as well.

The file system that you want to roll back must be unmounted and remounted, if it is currently mounted. If the file system cannot be unmounted, the rollback fails. The -f option forces the file system to be unmounted, if necessary.

Clones can only be created from a snapshot. When a snapshot is cloned, an implicit dependency is created between the clone and snapshot. Even though the clone is created somewhere else in the dataset hierarchy, the original snapshot cannot be destroyed as long as the clone exists. The origin property exposes this dependency, and the zfs destroy command lists any such dependencies, if they exist.

Clones do not inherit the properties of the dataset from which it was created. Use the zfs get and zfs set commands to view and change the properties of a cloned dataset. For more information about setting ZFS dataset properties, see Setting ZFS Properties.

Because a clone initially shares all its disk space with the original snapshot, its used property is initially zero. As changes are made to the clone, it uses more space. The used property of the original snapshot does not consider the disk space consumed by the clone.

```

- Correct me if I'm wrong. When a restore occurs the intermediate references via cloned or snapshots must be destroyed? Snapshots or clones are considered an intermediate if the restore takes place earlier than the preceding references.

- I like the idea of bringing a file system that simplified to the user to explore snapshots before they commit to restore. That's great!

- However, visibility to the file system does not help when data is stored in a corrupt database/config of an application. The user's only choice is to simply to restore and see if the app works.

Hopefully this shows more of my concern even if it comes out of the lack of fully understanding ZFS. if you say I'm wrong we've got this covered here at HexOS both from data and an application standpoint I will take your word for it.

NAS · December 15, 2024

On 12/7/2024 at 4:54 AM, joshdev said:

simply clone a snapshot state to a new dataset so the old dataset with all snapshots is preserved

I reached out to another forum for clarification about ZSF. So I understand now that a rollback is a rollback and not a restore. A rollback mounts that read-only snapshot allowing recovery of files.

I believe my initial concern is still valid. A rollback is only helpful if the user knows explicitly what needs to be restored.even (if it is exposed by a file system)

Two have (users A and B) 4000 in photos Immich
User B has been tagging a thousand photos over the course of a weeks
The application crashes and is unrecoverable. This goes unnoticed for two weeks.
How does the A user know when to restore when there are many bad snapshots of application not working?

Assume they don’t have a deep knowledge of application (looking at logs, most people don’t). It seems to me the user would have to use trial and error to with multiple restores of the application to discover it's working snapshot. However this comes at a great risk of losing data. In this case rollback isn’t helpful because the corrupt data isn’t obvious to the user. They don’t know if it application can run just by looking at files nor can they know how much of losing the metadata because it stored in a database.

The only thing I can think of the user will have to restore incrementally in chronological order to ensure they get the latest working snapshot for the application without data a loss.

NAS · December 15, 2024

Clarification to the post above: restore in reverse chronological order one at a time.

If that's true can we expect a average hexos to follow that practice?

sigh fyi... the edit timeout needs to be removed from the forums.

Edited December 15, 2024 by NAS

Sign In

The problem with ZFS snapshots

Recommended Posts

NAS

Magnus

joshdev

NAS

NAS

NAS

joshdev

NAS

NAS

NAS

Join the conversation

Browse

Activity

Store