Are zombie virtual machines coming to get you?
By Charles Clarke, Technical Director for APAC, Veeam Software
Wednesday, 09 July, 2014
The ancient human instinct not to throw away anything useful is leading to VM sprawl - the rise of zombie virtual machines that refuse to die.
There is no such thing as an IT department with too much money. That’s always been true, but budgets right now are universally tight. For many organisations I deal with, the question is not which ‘nice-to-have’ optimisation to leave off the list this year, but which essential project to delay.
This is not the only thing driving more and more organisations to virtualise their IT environments - there are many other factors like speed, reliability and flexibility - but making the best possible use of scarce resources is a big factor.
According to IDC, Australian companies that virtualise their servers and physical infrastructure can save $6 billion in costs between now and 2020. In addition, 6.4 million tonnes of CO2 could be avoided from 2003 to 2020 as a result of virtualisation.
The savings of virtualisation come from the fact that the more virtual servers you can run on a single physical machine, the more energy you will save and the less often you will have to buy new hardware. In most organisations, the business case for virtualisation is built on the assumption of a ‘consolidation ratio’ of between 10 and 20 to one - in other words, that every physical machine will host 10-20 virtual machines (VMs).
Once virtualisation has been implemented, though, users quickly notice many other advantages. One of the key benefits is the sheer convenience of creating new virtual machines. Instead of going through a lengthy procurement process to buy a new server, a development or project team can commission as many new VMs as they need in minutes.
This is great - but if you don’t do your housekeeping properly, the proliferation of VMs can slow your environment to a crawl and fatally undermine the business case for going the virtual route in the first place. It’s such a common problem it even has a name: VM sprawl.
Undead virtual machines
Part of the reason VM sprawl happens is the ancient human instinct not to throw away anything useful, ‘just in case’. So even though the project is wrapped up - all the files are archived and there are full backups of all the VMs that can be restored at a moment’s notice - people still hesitate to delete the VMs themselves. Often the machine is unregistered so it’s easy to forget about, but it’s still there in a kind of undead zombie state.
Superficially it seems less risky to let your old VMs hang around in case you need them one day - after all, hardly anybody ever gets in trouble for NOT deleting something. But in reality, it’s VM sprawl that poses the real risks.
The first risk is the one that really kills your business case: every zombie VM is still using up valuable system resources, especially expensive storage. Let’s say it’s been allocated 500 Gb of data - even if that storage space is empty, it still can’t be used because it’s reserved for that VM. It’s like putting a traffic cone in an empty parking space - it instantly turns free space into wasted space.
Combine zombie VMs with other junk data like old ISO files and defunct system snapshots and you can push an entire storage array to breaking point surprisingly quickly. The larger the organisation, the more people there are contributing to the sprawl and the faster trouble will happen, no matter how well resourced you are.
There’s also a compliance risk: how many operating system licences do you have? Every zombie VM is using up one of them. When the vendor comes to do an audit and finds you over the limit, “Whoops, we seem to have overlooked that one” is not going to be a good enough excuse.
Keeping a clean machine
Business processes that prevent wanton creation of new VMs are one way to solve the problem - but a process that overcomplicates things also undermines the flexibility benefits of virtualisation in the first place.
You also need to implement thin provisioning, particularly for SAN storage space, which is notoriously expensive. Thin provisioning means you can configure your VM with all the storage you think is required - but actual storage space will only be released when it’s needed, up to the maximum you have configured for. With this you can ensure the most efficient use of your entire available storage pool.
Another solution is to be rigorous about clearing out old and unused files - but to do that, you first need to find them. This is not a job that can be done manually - it needs specialised tools that can not only identify junk files but also enable you to delete them safely.
Another problem that undermines the business case for virtualising is misallocation and over-allocation of system resources.
The easiest way to create a new VM is to do it from a template - every new machine gets, say, two CPUs, 8 Gb of RAM and 100 Gb of storage space. That’s a good, average spec - but unfortunately, not every VM has average needs. Some will need more resources and some will need much less - and it’s very seldom possible to know in advance which is which.
You need expert systems
Assigning too many resources is wasteful and inefficient - it needlessly ties up resources you could be using somewhere else. But assigning too few resources hurts performance.
Finding the right balance between efficiency and performance needs the right tools - by which I mean tools specifically designed for managing virtual environments.
Virtualisation works because it allows us to overcommit resources, knowing that most processes don’t need as much as they are allocated. It’s like the way airlines overbook flights, knowing that there’s a fairly predictable proportion of people who won’t turn up. Because of this, and the multiple layers of abstraction created in the process of virtualising, it is almost impossible to understand the true resource usage of any VM using the same tools and approaches we use in physical environments.
You need the right monitoring and reporting tools to ensure that virtual overcommitment doesn’t become real. These tools will also allow you to manage your VMs dynamically, allocating and taking away resources according to their needs.
Because the rules of virtual environments are so different, it’s important to choose monitoring tools that give recommendations, not just reports. What you need is not just one more report, but an expert system that contains a lot of the specialist knowledge any organisation running a virtual environment needs.
You also need a system that can identify problems affecting specific servers, departments or applications. Throwing more RAM at a slow machine, for example, isn’t always the right solution. What if the real problem is disk related, or an application with a memory leak? Monitoring tools need to supply that information - in the specific context of a virtualised environment with overcommitted resources.
Human decision-making is always paramount, of course, because humans know things about context and the future that machines can’t. If the monitoring tool notices that between January and June a particular server was underutilised, it may recommend switching resources away from that machine. The human who knows that the company’s financial year-end is coming up in July will also know not to implement that particular recommendation.
Between an IT manager’s knowledge of the business context, and a good monitoring tool’s knowledge of the virtual environment, it is possible to run a virtualised IT shop that delivers both better performance and lower cost, meeting the ROI targets set by the business.
There is no doubt that organisations will increasingly continue to virtualise their IT infrastructure. Traditional IT policies, legacy management and reporting tools are fast becoming inefficient in meeting the demands of an environment delivering increased performance, scale and availability. A rethink of IT management to meet these demands and ensure you reap the most from your virtualised infrastructure is a no-brainer. This will eliminate the potential pitfalls that could seriously undermine the business case for virtualising.
Safeguarding Australia's global resiliency
There are three essential steps to design applications for maximum resiliency.
Staying ahead: business resilience in the hybrid cloud era
The rise of cloud computing and advancements in virtualisation have revolutionised how businesses...
Taming cloud costs and carbon footprint with a FinOps mindset
In today's business environment, where cloud is at the centre of many organisations' IT...