× The internal search function is temporarily non-functional. The current search engine is no longer viable and we are researching alternatives.
As a stop gap measure, we are using Google's custom search engine service.
If you know of an easy to use, open source, search engine ... please contact support@midrange.com.



So here's the thing about de-duplication.

Lets say the device is empty and this is my first backup. The first block of data then by definition is unique so it can't be deduped. It gets written. So now here comes block two, it needs to be compared. Not a huge deal to compare one block to another but clearly they aren't going to compare the entire block it must be a hash function of some sort. So what is a block then? Is it just every 1KB? 16KB? 256KB? (the size of an LTO Block) Every 1MB?

Then it gets harder because after there have been tens of thousands of blocks written we can't compare to every one of them on the fly I wouldn't expect, at least not without a hash anyway. And of course you gotta keep track of how many tapes use a particular block or you could delete it when you think it's no longer needed and that would be bad. So that tracking is also part of the process. The thing is to me how do a significant number of those blocks stay the same? If I add a few bytes to a text string in the early part of a large table wouldn't that offset the bytes in every subsequent block for that table, even that library, causing them to be different almost every time? I don't know that to be true, I'm just guessing.

I realize that you probably don't have the de-dup code handy to refer to as I suspect that's kinda proprietary like but it IS an interesting problem to me.

- Larry "DrFranken" Bolhuis

www.Frankeni.com
www.iDevCloud.com - Personal Development IBM i timeshare service.
www.iInTheCloud.com - Commercial IBM i Cloud Hosting.

On 8/17/2015 10:24 AM, Kendall Kinnear wrote:

You're going to really work my memory now. I haven't thought about this stuff in over a year. :-)

If I remember correctly, the deduplication is inline (on the fly) as the data passes through the device and before it is written to disk. If you think about it, it doesn't matter if all the data is available. The data being processed is checked against what's stored on the disk and only the dedup pointers are written if that is all is needed.

Yes you can setup the replication to happen automatically with no human intervention.

I am pretty sure you can flag a virtual volume as read only but I don't remember for sure.

Respectfully,
Kendall Kinnear
System Analyst
Standard Motor Products, Inc.
Work: 972-316-8169
Mobile: 940-293-7541

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of DrFranken
Sent: Monday, August 17, 2015 9:13 AM
To: Midrange Systems Technical Discussion <midrange-l@xxxxxxxxxxxx>
Subject: Re: VTL and ProtecTIER deduplication

Kendall,

Clearly you have very detailed knowledge of this environment! So a few follow on questions if I may.

Does the de-duplication happen on the fly, that is during the save or is that analysis and de-dup done after the save when all the data is available?

Can you replicate a backup to a second ProtecTIER device so that it gets off-site with no manual handling?

Can you flag a tape in the ProtecTIER VTL as 'read only'?

Thanks!


- Larry "DrFranken" Bolhuis

www.Frankeni.com
www.iDevCloud.com - Personal Development IBM i timeshare service.
www.iInTheCloud.com - Commercial IBM i Cloud Hosting.

On 8/14/2015 4:26 PM, Kendall Kinnear wrote:

1) I had a couple of clients using protecTIER when I was with a Business Partner. They had vastly different deduplication ratios due to their data makeup.

2) The IBM Business Partners have access to a spreadsheet that you can use with BRMS (and appropriate version/release/PTF) to estimate the amount of physical storage required.

3) Any LTO media should be able to be migrated into the VTL using DUPTAP functions.

4) You might want to change some things in BRMS. At a minimum the tape library you default to. More thoughts on BRMS in later questions. The thing to remember about the ProtecTIER is that the IBM i simply sees it as a LTO tape library, nothing more and nothing less.

5) You'll need as many fibre adapters as you want permanent connections in your partitions. If you don't mind moving adapters between partitions then you can get by with fewer adapters.

6) Yes, you can run a large number of concurrent saves from different partitions or the same partition, it depends on your configuration.

7) Speed, there's the rub. Last time I checked the ProtecTIER ran at approximately LTO3 speeds (that was mid 2014). You make up for that by having multiple virtual drives within the virtual library. Depending on the workload of your backup from a partition, you may need a single fibre that sees 3 or 4 drives assigned to that partition. There is not a 1 to 1 drive to fibre ratio with a VTL, each connection can have multiple virtual drives. Here's where you'd change BRMS to do parallel saves across multiple virtual drives.

Respectfully,
Kendall Kinnear
System Analyst
Standard Motor Products, Inc.
Work: 972-316-8169
Mobile: 940-293-7541

-----Original Message-----
From: MIDRANGE-L [mailto:midrange-l-bounces@xxxxxxxxxxxx] On Behalf Of
Steinmetz, Paul
Sent: Friday, August 14, 2015 2:39 PM
To: 'Midrange Systems Technical Discussion' <midrange-l@xxxxxxxxxxxx>
Subject: VTL and ProtecTIER deduplication

I'm referencing Redbook - IBM ProtecTIER Implementation and Best
Practices Guide

1) Anyone using ProtecTIER deduplication?
I'm curious how much savings I might see, especially large history files that do not change.

As data is written to the ProtecTIER device, it is examined for identical blocks of information that already were added to the repository. This identical data is not stored again in the repository; it is referenced as duplicate data and reduces the amount of disk space that is required. This process is known as deduplication. The engine for ProtecTIER deduplication is called HyperFactor.


2) Is there anyway of estimating how much storage is needed?
I currently have Perm Retention - 77 full LTO5 - 77 x 3.0 tb (compressed) = 231 tb.

Create only the number of cartridges that your repository can handle, maybe even fewer to control the repository allocation of different VTLs. You can estimate the size of a repository by multiplying the real size of the repository by the HyperFactor ratio. Then, divide it by the tape size and determine the optimized number of tapes.
Important: Be careful not to overestimate the repository size. Wait
until the backup application sends some data to provide a better view
of the real deduplication ratio


3) Can existing LTO5 be migrated?

4) Will this change anything within BRMS?

5) Would I need an additional FC adapter on each LPAR to attach to the I, multiple LPARs?

6) Can multiple processes be run simultaneously from multiple LPARs as I do now with 4 LTO5 HH?

7) Would a VTL be faster than LTO5, LTO6, LTO7 (soon to be announced)

18.2.1 Backup considerations with ProtecTIER Using VTL is not necessarily faster than physical tape backup. IBM tape products have been tested and work efficiently with IBM i. IBM i is able to achieve 90% - 100% of tape drive speed in an environment with fewer tape drives. You often require multiple streams in a VTL to achieve the same performance throughput as physical tapes. In this scenario, Backup, Recovery, and Media Service (BRMS) is useful in managing the tape media resources for parallel saves.
In addition to performance throughput, you can use BRMS to share VTL resources across multiple LPARs.
BRMS tracks what you saved, when you saved it, and where it is saved. When you need to do a recovery, BRMS ensures that the correct information is restored from the correct tapes in the correct sequence.

Thank You
_____
Paul Steinmetz
IBM i Systems Administrator

Pencor Services, Inc.
462 Delaware Ave
Palmerton Pa 18071

610-826-9117 work
610-826-9188 fax
610-349-0913 cell
610-377-6012 home

psteinmetz@xxxxxxxxxx<mailto:psteinmetz@xxxxxxxxxx>
http://www.pencor.com/

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing
list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe,
unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.

________________________________

Please consider the environment before printing this email.

The content of this e-mail (including any attached files) is confidential and may be privileged and protected by law. It is intended solely for the purpose of the person to whom it is addressed. If you are not the intended recipient of this message, please notify the sender immediately and delete the message (inclusive of any attached files). In addition, if you are not the intended recipient of this message, any disclosure, copying, distribution or taking any action in reliance of the contents of this email is strictly prohibited.

--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list To post a message email: MIDRANGE-L@xxxxxxxxxxxx To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request@xxxxxxxxxxxx Before posting, please take a moment to review the archives at http://archive.midrange.com/midrange-l.


As an Amazon Associate we earn from qualifying purchases.

This thread ...

Follow-Ups:
Replies:

Follow On AppleNews
Return to Archive home page | Return to MIDRANGE.COM home page

This mailing list archive is Copyright 1997-2024 by midrange.com and David Gibbs as a compilation work. Use of the archive is restricted to research of a business or technical nature. Any other uses are prohibited. Full details are available on our policy page. If you have questions about this, please contact [javascript protected email address].

Operating expenses for this site are earned using the Amazon Associate program and Google Adsense.