Best practices for data recovery
Tuesday, 10 August, 2010
Data is a vital business asset for most organisations. Although high-availability technologies significantly reduce the probability of data loss, events such as technical failure, virus attacks, deliberate sabotage or employee error can still destroy data. Therefore, organisations must plan for data recovery and implement mechanisms for managing the data recovery process, says CA’s Scott Caulfield.
Organisations that suffer severe data loss are at high risk of subsequent business failure. Even small periods of downtime can incur significant costs. The challenge for organisations is to implement an effective recovery management environment that maximises the efficiency of the data recovery process.
When protecting information, speed is everything. Before computers were widely used in the business environment, information written on paper records was central to daily operations. Therefore, businesses took strenuous precautions to protect records from potential disasters, particularly the threat of fire.
Now computers are widely used and data records have replaced paper ledgers. For most businesses, this data is critical and includes lists of potential or current customers, invoices, receipts, sales figures and marketing information. This data exists only as bits on a hard drive. If this information is lost irretrievably, an organisation can fail within days. Even organisations that successfully recover their data are at risk of subsequently failing. Businesses that suffer major data loss are at high risk of subsequent business failure.
Complete reliability in even the most highly redundant system is not possible and perfect reliability cannot prevent a user from accidentally deleting a vital file. Disasters and errors are inevitable; the challenge is to ensure that organisations protect data from disasters and can restore data quickly and efficiently.
When recovering data, every second counts. Managing the data recovery process effectively is a crucial factor for ensuring a rapid return to operational conditions. Good data recovery management applies to all data types, from recovering a single corrupt mailbox or replacing an accidentally deleted file to restoring an entire business-critical database.
Fast recovery enables organisations to meet service level agreement (SLA) requirements, to minimise the cost of the unavailability of data and to get back to a working environment as quickly as possible. It is critical to know your environment and plan for failure.
Top tips for best practice in fast and effective data recovery
- Match recovery management technology to data value
All organisations hold data of varying value and a focus should be placed on matching recovery management technology to the value of data. Based on downtime costs, firms should use a mix of traditional backup-and-restore technology along with D2D2T hierarchies, possibly including virtual tape libraries (VTLs), inter-site data replication, data rewinding and virtualisation.
- Automate the recovery process
Recovery costs, particularly when manual intervention is needed, can be significant. By automating the recovery process as much as possible, cost is brought down and the probability of failure is reduced. Automation is even more important in complex and distributed environments.
As soon as an organisation outgrows a single office, backup and recovery operations become more complex. Wherever possible, organisations should seek to centralise recovery management processes, ideally by centralising the data that is needed to recover operations. Centralised recovery management enables consistently applied backup policies and keeps physical control of data and backup sets.
One key factor for centralising recovery management is an integrated suite of management tools and associated policies that require minimal staffing requirements. These tools should integrate with the existing alert and notification infrastructure to provide timely warning of data protection issues.
The importance of management tools should not be underestimated. The availability of dashboards and visual reports that simplify and reduce the resource requirements to effectively monitor and manage the process is critical. Being able to quickly scan a dashboard and then drill down to get the details on an issue is another key piece in ensuring that downtime is minimised.
It is also important to view the storage environment in the context of the broader technology environment. Understanding what is happening with the hardware and operating system of your critical backup servers as well as infrastructure like the LAN and/or WAN are important aspects of a comprehensive central storage management strategy. If this data can be linked into the management dashboards, your ability to effectively manage and protect critical data will be significantly improved.
- Maximise backup windows
When organisations operate in a global market over a 24-hour business day, allocating backup windows can be challenging. Branch offices in other time zones will require separate backup schedules from the main office.
Recovery management must integrate with the backup process to track backup sets without manual intervention and regardless of the time zone of the backed up server. This time-sensitive information must be available during the restore process to ensure that restoration of the correct version of a file or database is possible. Snapshots offer an increasingly popular mechanism for reducing the time to back up a network resource.
- Consider continuous data protection
Continuous data protection (CDP) technologies enhance both backup and data replication by providing the option to return a data source to the state it was in at a specific time. For example, if corruption occurs in an email storage group, CDP can sequentially undo every database write and transaction log update until the database returns to a consistent state.
CDP is an essential part of a recovery management plan where an organisation has critical applications containing critical data that is deemed to be highly valuable. Extending the solution to include ‘high availability’ should be carefully considered in this instance. In the event that a critical server actually fails, high availability will reduce system downtime dramatically.
- Consider virtualisation technologies
The growth in popularity of virtualisation for server consolidation and simplifying server management provides opportunities for recovery management in disaster recovery scenarios. These opportunities include:
- Increased numbers of standby systems - Virtualisation enables standby sites to accommodate more standby systems while still maintaining separation between the standby systems.
- Reduced costs - Virtualisation runs multiple virtual servers on a single host computer, which reduces costs such as heating, power, racking hardware and network hardware.
- Integrated high availability and load balancing - Virtualisation provides integration of both high availability and load balancing. An organisation can integrate high availability technologies such as clustering on virtual computers by using iSCSI connections to shared storage (NAS, DAS, or SAN).
- Simplified server deployment - Virtualisation simplifies the process of server deployment and reduces the deployment time.
- Don’t forget security
Because backup sets, snapshots and replicas can contain complete copies of organisational data, the highest protection should be accorded to this data. The same security principles and layered security model should be applied to backup sets and backup media as for a production server, with the additional consideration that backup media is more portable and easier to remove than a computer.
An organisation must also control who has the right to backup and restore data and who can authorise the restore process for different data types. Restore authorisation depends on the value and effect of the data restore process.
- Test the restore strategy
The truism of data protection is that the only test of a successful backup is a successful restore. The challenge is that doing a real-world test on production systems creates a significant load on systems and personnel. It is essential to ensure that the technology used in the backup and disaster recovery process has the ability to validate what data has been backed up and replicated. It will also provide a level of comfort without the burden of a full DR test.
As such, data restores cannot be one-off operations that happen after a disaster. Organisations must make them an integral part of managing their network by scheduling them in monthly maintenance operations.
Trial restores must be staged, ideally to an environment that closely resembles the operational set-up. Only successful restorations will prove that the disaster recovery plan works, and build up management and staff’s confidence in it.
For any organisation, data recovery should become part of the monthly maintenance operations, and recovery management should have a central role in any disaster recovery plan.
* Scott Caulfield is the Business Unit Director for CA’s Recovery Management and Data Modelling Business Unit where he is responsible for the day-to-day running of the business unit. In this role, Caulfield overseas sales, marketing and presales operations as well as being responsible for reviewing future investments such as developing the RMDM SaaS offering.
Two large-scale, grid-connected batteries are to be built in Victoria with the help of the...
Companies looking to modernise their overall IT infrastructure cannot afford to take a relaxed...
CIOs must free their organisations from complex backup strategies in order for storage and...