Sunday, December 07, 2008

An FAQ on Deduplication.

After my previous post, I thought an FAQ on deduplication would be the best way to strengthen concepts around the same .. So here they are :

What type of data reduction ratios should you realistically expect using deduplication?
Realistically, I think it's safe to assume a ratio of anywhere between 13 to 17X. You'll probably see lower ratios on target-based deduplication, and you'll see higher ratios on source-based deduplication just because of how they are architected.

Which data deduplication appliances and backup software do you view as enterprise ready?
Some of the technologies that may be enterprise-ready are Dilligent Technologies and Sepaton. Both companies have had products on the market for some time now, and they are both having pretty good success in the market.

How long does it take for companies to achieve these data reduction ratios?
In the short term, you might see a reduction of 2X or 3X over the course of the first month or so. But the longer you keep the data deduplication, that's when you start to see the larger numbers.

When does data deduplication using backup software on the host make sense?

There are a couple factors that you really need to consider. If you're bandwidth constrained and are trying to back up data and you have large amounts of data coming over the network, then using data deduplication at the host makes a lot of sense. That can dramatically free up the amount of bandwidth that you have.

It is important that the host can sustain the initial hit. This technology requires memory and CPU processing to perform the data deduplication. It might be a good idea to run the initial backup over a weekend when the backup window is a bit longer.

Are there any instances where data deduplication will not provide any benefits? Superior benefits?

With photos, videos, etc., there's not a lot of duplicate information. If there are a lot of new images being created, then you'll see very little benefit from data deduplication. In that case, you're better off just running differential or incremental backups.

No comments: