Let me just start by saying that I’m not biased – I’m really not.
No, really, I’m not.
I promise I’m not biased, cross my heart.
Honestly I get no more out of recommending IBM than I do from anyone else.
Really, I’m working with Netapp this week, EMC next week, and HDS the week after that.
I’m not biased at all.
If I seem to be labouring the point, it’s because over the last year and a half, I’ve found that every time I talk about how good IBM’s XIV storage array is, after a few minutes people start giving me funny looks (funnier than usual) and asking “what’s in it for you?”.
Given that I spent the first years of my IT career designing solutions around EMC Symmetrix and in the years since have spent my time designing storage solutions for every major player in the market (and a good number of minor ones) I really don’t see myself as having any one favourite. With no exceptions, the organisations I’ve worked for have been vendor-neutral, and I’ve never really got the hang of the cordial hatred that vendors seem to have for each other’s products.
Lately, I’ve found that so many people are primed with the idea that anyone who even mentions XIV as a possible solution must be in the pocket of the IBM sales mafia. I’ve no sooner begun talking about the benefits of the architecture, when people begin to question my impartiality.
I’ve come to the conclusion that the problem is, people are used to storage technologies (and technology in general) letting them down. No storage technology is ever perfect – there are always hidden flaws and gotchas which surface only after the array has your organisation’s most precious data stored in its belly.
So anyone who comes along talking enthusiastically about an array which “just works” is automatically suspect. It’s big, it’s expensive – so it must have problems. So in this article I’m going to look briefly at the benefits of the array but concentrate mainly on the issues I’ve experienced in the last year and a half of working with XIV, thus finally demonstrating my sceptical side.
The benefits
Anyone who’s read a marketing slide from IBM knows the benefits of XIV – it’s easy to manage, stores up to 79TB in one rack space, is highly resilient and performs well at a low price. With an increasing number of my customers running happily on XIV, I have no reason to disagree – in the large, well publicised (important) areas covered by the marketing brochures, the XIV really does “just work”.
To me then, the XIV has earned its place at the top table. At 79TB of capacity and 50 to 70,000 IOPS performance each, it’s never going to compete on a 1:1 basis with the largest Symmetrix or Tagmastore arrays (200,000 IOPS and 600TB of Tier 1 storage anyone?) but then it does cost around 10x less than these arrays, and I’ve found that several XIV arrays will work as well as one large array (a Tier 1 array with the capabilities discussed above will take up 9-10 rack spaces in the datacentre, compared to 4 for the equivalent XIV).
The old chestnut
“Double drive failure on an XIV will lose data!” scream competing vendors, somehow managing to imply that in a similar situation, their own systems would operate untouched.
Really?
Maybe one day this will happen to one of my XIV customers and I’ll know for sure – in the meantime I have to go off “interpretations of the architecture” and “assumptions of how the array will work”.
I’ll use my own interpretation, thanks 🙂
My reading of the system is that data loss is at least statistically possible in the case of simultaneous double drive failure. Data entering XIV is split into 1MB chunks (XIV confusingly calls them “partitions”). These partitions are copied and both copies are spread semi-evenly across the array. Distribution is not random, much work goes into keeping these partitions on separate drives, separate modules at opposite ends of the array. But at some point the two copies have to be on two disks – if those particular two disks are lost, the data is gone. Once you have a million of those partitions floating around on any given disk, the counterparts of some will have to be on each disk in the array.
But how likely is the situation to occur? In 10 years I experienced exactly two incidents of double disk failure – in both cases these were within the same RAID Group, a good number of hours apart. Fortunately in the first incident the hot-spare drive failed during rebuild – embarrassing and time-consuming to fix but no data loss. In the second incident the RAID Group itself was lost and had to be recovered from backups. In both cases, the explanation given for the second failure was age, coupled with the stress put on the disks by the rebuild process (72+ hours of sustained writes to disk for the rebuild + trying to keep normal operation going cannot be good for the disk).
RAID 5 146GB (4+1) | RAID 6 146GB (12+2) | XIV 1TB RAID x | |
Number of copies of data | 2 | 3 | 2 |
Rebuild time after disk loss | 72+ hours | 72+ hours | 30 minutes |
How many drives can be lost? | 1 | 2 | 1 |
Disks involved in the rebuild | 4 | 12 | 160 |
Increase in full load during rebuild | 20% | 7% | 0.63% |
Estimated lost data if 2 disks lost | 526 GB | 0 GB | 9 GB |
The point for me is that the examples above don’t make me stop discussing RAID 5 solutions with customers – if a customer wants to survive double drive failure they put in RAID 6, accept that they need many more drives to get performance (and start worrying about triple drive failure). Implementing RAID 5 involves a risk that two drives may go at some point and they may lose data – this may also be a risk with the XIV. If our data is this important, isn’t this what we have backup systems, snapshots, and cross-site replication for?
A little history
To me, it’s one of these issues that gets blown out of proportion – back in the early 2000’s, I kept hearing tales of EMC sales people, who went into meetings with customers, only to be asked probing questions (rather obviously planted by competitors) of the type “why is Symmetrix global cache a single point of failure?”
This was a problem for EMC, as the answer is “it may look like that’s the case, but actually it’s not an issue – here’s why…………… [Continued for the next 3 weeks]”. At the time, what the customer saw was EMC sales teams descending into jargon and complex technobabble rather than just give a simple (to them) yes or no answer. To me this is the same issue – the explanation of why there is no real issue is so involved, that customers lose patience.
EMC veteran blogger Chuck Hollis says it better than I could in his discussion of reasons why EMC delayed implementing RAID 6. The full text can be found here:
http://chucksblog.emc.com/chucks_blog/2007/01/to_raid_6_or_no.html
Comments that I find particularly relevant to this discussion are:
“And way, way, way down the list – almost statistically insignificant – was dual disk failure in a single LUN group.”
“There’s a certain part of the storage market that is obsessed with specific marketing features, rather than results claimed”
“As a result of our decision [delaying RAID 6 implementation], I’m sure that every day someone somewhere is being pounded for the fact that EMC doesn’t offer RAID 6 like some of the other guys”
The issues
So, having gone through all of that, the IBM XIV is perfect?
No chance.
Over the last year and a half, a number of issues have become obvious in the operation of the XIV. None are fundamental to the technology itself, but have formed a barrier to customer take-up of the array.
1. High capacity entry point:
XIV can be sold with a minimum of 6 modules, or 27TB. This is way down on the “at launch” configuration of 79TB only, but is still too high an entry point for many customers. This high start point pretty much assures the survival in some form, of XIV’s internal IBM competitors, the DS3000 and DS5000 – these arrays can scale from much smaller volumes of storage, so will need to be kept alive in some form to provide for the low end of the market, at least until XIV can be sold in single module configurations.
2. Upgrade Step:
Once customers reach 27TB, the next place they can go from 6 modules is 9 modules – 43TB. Again, this is a tremendous jump and has put off some customers who prefer a smooth upgrade path.
3. Lack of iSCSI in the low-end configurations:
So your small customer has spent more than he needs to on getting his very performant, very easy to manage storage array, but at least he can save money by using iSCSI, and avoiding the cost of dedicated fibre switches and HBAs? Not a chance – in the 27TB config, XIV has no iSCSI capability. Until you upgrade to 43TB you don’t even get the physical iSCSI ports at all. So the segment of the market that could make best use of iSCSI, doesn’t get to use it.
4. Rigid linkage of performance to capacity:
Traditional storage tends to have a central processor, with capacity added by adding disk trays. For increased performance the central processor is upgraded and faster disks (or solid state disks) can be added. With XIV, the growth model is fixed – each module adds both disk capacity and increased performance, as cache and processors are built into the module itself. This is an immensely simple way of doing things and ensures that customers always know how much performance headroom is available, but it’s a double edged sword.
I have found there are times when a customer has a tiny storage volume, but massive performance requirements (in a recent case, 5TB of volume, average 50,000 IOPS performance requirement). At this point, the customer has two choices:
A) Provision the largest storage processor going on a traditional-model storage array, pack with a number of SSD drives or,
B) Provision a full 79TB XIV to get the benefits of the entire 15 modules of cache and processor.
Both of these end up roughly the same price, but with option B, the purchaser has to explain to his superiors why he has 74TB of capacity that no one else can use (he then has to explain again next week when finance decide they want some of his unused space, and the week after to purchasing….).
Some might say that this is a situation that capacity on demand models were made for, but in the situation above, rare as it may be, CoD will only stretch so far. If a customer is demanding that 90% of the technology delivered will never be used (or paid for) it may not be a commercially advantageous deal to make.
5. Lack of Control:
This is one which has only become apparent as the first IBM-badged XIV arrays have come to the end of their original support contracts.
IBM restricts access to a number of key technical functions. Manually phasing out (removing from service) and phasing back in of modules and disks can only be performed by an IBM technician – access to these functions is locked, and guarded by wolves.
So you replace a disk – and until IBM support remotely phase your disk back in, that disk will just sit there, glowing a friendly yellow colour in your display and not taking on data.
The upside of this is that IBM support will probably call you immediately to tell you that the disk is awaiting phase in, and ask would you like them to do it (this has happened to me on a number of occasions).
But now hey – you’re out of support! IBM will no longer phone you, when you call you’ve no support contract to draw on to get them to do the work for you, and to top it all you don’t have access to do it yourself. A number of the underlying controls are locked away and IBM appears to have no plans to give out access, precluding any form of “break/fix” maintenance option.
This sort of IBM control-freakery is what keeps me awake at night – a decision by an IBM suit on the other side of the world, may make perfect sense at the time, but at 3am in a cold datacentre, when IBM are calling to tell me that my “issue can’t be resolved” due to that policy, is really not the time I want to find out I have an “insurmountable opportunity” on my hands.
6. Scheduling of Snapshots:
XIV makes copying data incredibly easy. Snapshots are created at the click of a button, can be made read/write and mounted as a development volume. Up to 16,000 can be created and maintained at any one time (eat that, “8 snaps maximum and 20% drop in performance” traditional storage!).
So why IBM, couldn’t you have included a built-in scheduler in the XIV interface, to let me make a new copy of a snapshot at regular intervals?
Oh, you did? You included built-in scheduling for the snapshots used by the asynchronous replication process? So new replication snapshots can be created and overwritten on a regular basis to ensure that the replication stays on target? But for my own application snapshots, I still have to buy an external replication manager application and set it up outside of XIV? Thanks a bunch, IBM.
7. Support for older AIX versions:
Of all the operating systems supported by XIV, none has given as much trouble as AIX. From direct fibre connection support (it doesn’t, end of story) to load balancing (it does, but only on newer releases and needs to be set manually) the AIX/XIV combination has taken some time to get into a usable state. Recently though, as long as you’re on 5.3.10 or 6.1, you’re sorted.
If you happen to have applications which need to remain on a version older than 5.3.10, you may find you have issues – starting with no automated load balancing and low queue depths. This in turn leads to low performance, complaints, heart-burn, indigestion and generally bad stuff.
I’m not an AIX expert, but from the work-arounds described in various places on the web, the cure is worse than the disease – manual scripting and complex processes which take all the fun out of managing an XIV environment.
IBM’s response so far has been:
“All you Luddites join the 21st century and upgrade to 6.1 – and can we sell you Power 7 while we’re at it….”
In Conclusion
In the last year and a half, I’ve gone from disbelief in the claims made by IBM, to grudging acceptance to a genuine liking for the product. It is simple, powerful and cost effective, and I can see why there is a concerted effort by competing vendors to remove it from the field by throwing up a FUD screen around it.
The point for me is that most of the negative comments I’ve seen tend to be of the “there is no possible way that XIV can do what it claims!!!” school of thought. With a growing customer base running everything from Tier 1 Oracle, to MS Exchange, to disk backup systems on the arrays, I beg to differ.
It does no single thing massively better than the competition, but just does everything very well:
– It’s easy to manage (but so is an HP EVA or SUN 7000)
– It’s very fast for its size and cost (but an EMC Symmetrix is faster)
– It contains a large volume in a small area, (but an HDS USP holds more and can virtualise)
But you can see there is no overlap – XIV appears in all categories, while the competition tends to focus on one or two areas.
As a solutions designer, I see the acceptance of XIV’s simple way of working leading to an end of complex LUN and RAID Group maps, the end of pre-allocation of storage months in advance, the end of pen and paper resizing exercises, and so on.
That said, it can be seen above that the array does have some nagging problems, most of them soluble if the will exists. In most of these decisions (i.e. no iSCSI in low-end arrays) I see the corporate hand of IBM –“let’s not make the box too convenient for small customers, or they’ll never buy upgrades”
I see the IBM connection as a two-edged sword. On the one hand, without IBM’s name, support, and R&D spend, XIV would still be languishing down in the challengers’ space with the likes of Compellent and Pillar data. Having XIV harnessed to the IBM machine leap-frogged it into a mainstream position, bypassing potentially years or decades of effort.
But on the other hand, fitting XIV into IBM’s corporate strategy causes decisions that are hard to stomach. The XIV modules that are added as part of the 6 to 9 upgrade have only one difference – the addition of a dedicated card for iSCSI connection. There is no reason I can see why these cards could not have been added to the first set of modules to allow iSCSI at 27TB.
In the last year, I’ve probably spent as much time designing solutions around other vendor’s products as I have IBM’s. This is because in the real world a buying decision takes in many more factors than just “which disk array is the fastest/best/cheapest”.
EMC for example, have a completeness of offering in the storage area, which IBM can only aspire to; having spent the last 10 years developing an ecosystem of complementary products, EMC are fixing problems that IBM hasn’t even really started to address, apart from a spate of acquisitions of the type that EMC started before 2000 and have continued ever since.
I’m pretty sure that while the other vendors are attacking the XIV way of doing things, in the background they’ll also be coming up with their own ways to match it – IBM has a head-start, but that’s all it is – a temporary advantage in one field of storage. To capitalise, they need to stop thinking that the IBM way is the only way and look at some of the engineering decisions that are not working for customers.
All of the issues discussed above have either been experienced by me either during design or implementation. Whether they are an impediment to a customer considering XIV will very much depend on the situation. Personally I find few occasions when XIV is not at least worth a look, even if it’s not the be-all and end-all.
But maybe I’m just biased.
Update – a few days after I posted my article, Tony Pearson at IBM posted an article making some similar points regarding double drive failure, but providing an additional piece of very key information.
Tony is careful to point out that no customer has ever experienced double drive failure, but his calculations for worst case data loss match mine at 9GB (always nice when the professionals agree with you) 😉
The additional info has to do with the “Union List” – this is something we’ve known must exist, but up till now it’s not something I’ve seen published confirmation for (secretive bunch, IBM). Basically the Union List will tell you which 9GB of data has been lost in the form of a logical block address list, allowing targeted recovery of the lost data.
I’ve not seen this in action so no idea how well it works in practise, but I’m going to have much fun pursuing it with IBM over the next few months…….
Update (2) – A couple of other articles have linked to this one. Thanks to both Simon Sharwood at Techtarget ANZ and Ianhf at Grumpy Storage for their favourable reviews and the redirect – I wondered where all the traffic was coming from! Will try to return the favour sometime 🙂