Deleting Data Is Not a Recommended Practice

SEP 09, 2009 2 MIN READ by Abel Avram

Oren Eini, alias Ayende Rahien, encourages developers to avoid database soft deletes leaving the reader with the impression that hard deletes are an option. Reacting to Ayende’s blog post, Udi Dahan strongly advices to completely avoid data deletion.

The advocates of soft delete operations suggest adding an IsDeleted column to the table leaving the data intact. A row is considered as deleted if the IsDeleted flag is set. Ayende considers this approach as “simple, easy to understand, quick to implement and explain” but “quite often, wrong”. The problem is:

that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues. …

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

If a developer is requested to delete data from the database, and if soft deletes are not recommended, then he is left with hard deletes. To preserve data integrity, he should delete the row and all related data in cascade. Udi Dahan draws the attention to the fact that the real world does not cascade:

Let’s say our marketing department decides to delete an item from the catalog. Should all previous orders containing that item just disappear? And cascading farther, should all invoices for those orders be deleted as well? Going on, would we have to redo the company’s profit and loss statements?

Heaven forbid.

The problem seems to be with the interpretation of “delete”. Dahan gives the following example:

What I mean by ‘delete’ is that the product should be discontinued. We don’t want to sell this line of product anymore. We want to get rid of the inventory we have, but not order any more from our supplier. The product shouldn’t appear any more when customers do a product search or category listing, but the guys in the warehouse will still need to manage these items in the interim. It’s much shorter to just say ‘delete’ though.

He continues by giving a correct interpretation of user’s intention:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Instead of using the IsDeleted flag, Dahan suggests using a field containing the status of the related data: active, discontinued, cancelled, deprecated, etc. Such a status field allows the user to look into the past to trace old data and make decision based on it.

Deleting data may have negative consequences beside breaking data integrity. Dahan’s recommendation is to leave all data in the database: “Don’t delete. Just don’t.”

 

https://www.zhihu.com/collection/845458021

省流版:

Ayende Rahien建议避免软删除,因为此举破坏了关系型数据库中的“关系”:“为了保持数据的完整性,他应该级联删除行和所有相关数据”。

而Udi Dahan表示反对,因为“现实世界并不会自动级联删除”。但是建议不使用IsDeleted标志,而是使用一个包含相关数据状态的字段:活动的、停产的、已取消的、已废弃的等等。

当年类似讨论也有很多,比如这个14年前的StackOverflow帖子:

Physical vs. logical (hard vs. soft) delete of database record?

 

Avoid Soft Deletes

OREN EINI aka Ayende Rahien | Aug 30 2009 | time to read 5 min | 884 words

One of the annoyances that we have to deal when building enterprise applications is the requirement that no data shall be lost. The usual response to that is to introduce a WasDeleted or an IsActive column in the database and implement deletes as an update that would set that flag.

Simple, easy to understand, quick to implement and explain.

It is also, quite often, wrong.

The problem is that deletion of a row or an entity is rarely a simple event. It effect not only the data in the model, but also the shape of the model. That is why we have foreign keys, to ensure that we don’t end up with Order Lines that don’t have a parent Order. And that is just the simplest of issues. Let us consider the following model:

Let us say that we want to delete an order. What should we do? That is a business decision, actually. But it is one that is enforced by the DB itself, keeping the data integrity.

When we are dealing with soft deletes, it is easy to get into situations where we have, for all intents and purposes, corrupt data, because Customer’s LastOrder (which is just a tiny optimization that no one thought about) now points to a soft deleted order.

Now, to be fair, if you are using NHibernate there are very easy ways to handle that, which actually handle cascades properly. It is still a more complex solution.

For myself, I much rather drop the requirement for soft deletes in the first place and head directly to why we care for that. Usually, this is an issue of auditing. You can’t remove data from the database once it is there, or the regulator will have your head on a silver platter and the price of the platter will be deducted from your salary.

The fun part is that it is so much easier to implement that with NHibernate, for that matter, I am not going to show you how to implement that feature, I am going to show how to implement a related one and leave building the auditable deletes as an exercise for the reader :-)

Here is how we implement audit trail for updates:

It is an example, so it is pretty raw, but it should be create what is going on.

Your task, if you agree to accept it, is to build a similar audit listener for deletes. This message will self destruct whenever it feels like.

 

Don’t Delete – Just Don’t

Udi Dahan | Tuesday, September 1st, 2009.

no deletes

After reading Ayende’s post advocating against “soft deletes” I felt that I should add a bit more to the topic as there were some important business semantics missing. As developers discuss the pertinence of using an IsDeleted column in the database to mark deletion, and the way this relates to reporting and auditing concerns is weighed, the core domain concepts rarely get a mention. Let’s first understand the business scenarios we’re modeling, the why behind them, before delving into the how of implementation.

The real world doesn ’t cascade

Let’s say our marketing department decides to delete an item from the catalog. Should all previous orders containing that item just disappear? And cascading farther, should all invoices for those orders be deleted as well? Going on, would we have to redo the company’s profit and loss statements?

Heaven forbid.

So, is Ayende wrong? Do we really need soft deletes after all?

On the one hand, we don’t want to leave our database in an inconsistent state with invoices pointing to non-existent orders, but on the other hand, our users did ask us to delete an entity.

Or did they?

When all you have is a hammer…

We’ve been exposing users to entity-based interfaces with “create, read, update, delete” semantics in them for so long that they have started presenting us requirements using that same language, even though it’s an extremely poor fit.

Instead of accepting “delete” as a normal user action, let’s go into why users “delete” stuff, and what they actually intend to do.

The guys in marketing can’t actually make all physical instances of a product disappear – nor would they want to. In talking with these users, we might discover that their intent is quite different:

“What I mean by ‘delete’ is that the product should be discontinued. We don’t want to sell this line of product anymore. We want to get rid of the inventory we have, but not order any more from our supplier. The product shouldn’t appear any more when customers do a product search or category listing, but the guys in the warehouse will still need to manage these items in the interim. It’s much shorter to just say ‘delete’ though.”

There seem to be quite a few interesting business rules and processes there, but nothing that looks like it could be solved by a single database column.

Model the task, not the data

Looking back at the story our friend from marketing told us, his intent is to discontinue the product – not to delete it in any technical sense of the word. As such, we probably should provide a more explicit representation of this task in the user interface than just selecting a row in some grid and clicking the ‘delete’ button (and “Are you sure?” isn’t it).

As we broaden our perspective to more parts of the system, we see this same pattern repeating:

Orders aren’t deleted – they’re cancelled. There may also be fees incurred if the order is canceled too late.

Employees aren’t deleted – they’re fired (or possibly retired). A compensation package often needs to be handled.

Jobs aren’t deleted – they’re filled (or their requisition is revoked).

In all cases, the thing we should focus on is the task the user wishes to perform, rather than on the technical action to be performed on one entity or another. In almost all cases, more than one entity needs to be considered.

Statuses

In all the examples above, what we see is a replacement of the technical action ‘delete’ with a relevant business action. At the entity level, instead of having a (hidden) technical WasDeleted status, we see an explicit business status that users need to be aware of.

The manager of the warehouse needs to know that a product is discontinued so that they don’t order any more stock from the supplier. In today’s world of retail with Vendor Managed Inventory, this often happens together with a modification to an agreement with the vendor, or possibly a cancellation of that agreement.

This isn’t just a case of transactional or reporting boundaries – users in different contexts need to see different things at different times as the status changes to reflect the entity’s place in the business lifecycle. Customers shouldn’t see discontinued products at all. Warehouse workers should, that is, until the corresponding Stock Keeping Unit (SKU) has been revoked (another status) after we’ve sold all the inventory we wanted (and maybe returned the rest back to the supplier).

Rules and Validation

When looking at the world through over-simplified-delete-glasses, we may consider the logic dictating when we can delete to be quite simple: do some role-based-security checks, check that the entity exists, delete. Piece of cake.

The real world is a bigger, more complicated cake.

Let’s consider deleting an order, or rather, canceling it. On top of the regular security checks, we’ve got some rules to consider:

If the order has already been delivered, check if the customer isn’t happy with what they got, and go about returning the order.

If the order contained products “made to order”, charge the customer for a portion (or all) of the order (based on other rules).

And more…

Deciding what the next status should be may very well depend on the current business status of the entity. Deciding if that change of state is allowed is context and time specific – at one point in time the task may have been allowed, but later not. The logic here is not necessarily entirely related to the entity being “deleted” – there may be other entities which need to be checked, and whose status may also need to be changed as well.

Summary

I know that some of you are thinking, “my system isn’t that complex – we can just delete and be done with it”.

My question to you would be, have you asked your users why they’re deleting things? Have you asked them about additional statuses and rules dictating how entities move as groups between them? You don’t want the success of your project to be undermined by that kind of unfounded assumption, do you?

The reason we’re given budgets to build business applications is because of the richness in business rules and statuses that ultimately provide value to users and a competitive advantage to the business. If that value wasn’t there, wouldn’t we be serving our users better by just giving them Microsoft Access?

In closing, given that you’re not giving your users MS Access, don’t think about deleting entities. Look for the reason why. Understand the different statuses that entities move between. Ask which users need to care about which status. I know it doesn’t show up as nicely on your resume as “3 years WXF”, but “saved the company $4 million in wasted inventory” does speak volumes.

One last sentence: Don’t delete. Just don’t.