.: Latest News :. .:News in Pictures:.




Horoscope Recipes

Weekly SectionMarker



Pakistan's Internet Magazine
Herald




Weather

Dawn Classified

Cowasjee Ayaz Mazdak Review Dawn Magazine Young World Images

Previous Story DAWN - the Internet Edition Next Story



Science.com

January 24, 2004



Securing your data with RAID solutions



By Asim Iqbal


IN the not-so-distant past, the concept of data storage was pretty simple. You had a computer, you attached a hard disk to it, and you were done. However, over the period, need for the high storage and security issues have forced drastic changes to be made in computer hardware and software technology, which in turn have made things really complicated for users.

The amount of information and data being stored on hard disks has increased exponentially, along with the need to keep that data accessible. There are many applications, particularly in a business environment, where storage requirement exceeds a single hard disk. But we all know that hard disks can fail unexpectedly. These situations require that the traditional “one hard disk per system” model be set aside and a new system employed. The system that provides more storage capacity and reliability.

Today, storage management is one of the most dynamic and fast-changing areas in the IT business and new storage technologies are appearing every day. One of the best storage solutions is to utilize RAID (Redundant Array of Inexpensive [or Independent] Disks) technology which has become a standard in the computing industry for applications requiring fast, reliable storage of large volumes of data. RAID solutions can be used for any need and any level, from desktops and workstations to enterprise servers.

RAID is simply an array of disks that combines multiple physical disks into one logical disk. It provides convenient, low-cost, and highly reliable storage by saving data across all the disks in a predefined order and can offer higher throughput than a single hard disk or group of independent hard disks.

RAID can be implemented via software solutions, hardware solutions (by special-purpose RAID controller cards), or even a mixture of the two. Hardware RAID can be further categorized into I/O processor-based and I/O controller-based, and software RAID can be further categorized into driver-based and OS-based.

Software RAID solutions use the system processes such as processor, host memory, and CPU cycles, hence affecting overall system performance.

On the other hand, hardware solutions are more expensive but hardware option tends to provide better performance. Hardware RAID controllers also typically provide some extra functionality that is not strictly part of RAID, such as hot swapping of disks (a feature that allows disk to be replaced, while the computer is powered on).
 


The processes

By utilizing the tools mentioned below, RAID solutions enhance the I/O performance and data availability.

Mirroring: Disk mirroring involves the simultaneous writing of the same data over one RAID controller to two separate hard disks. The principle behind mirroring is that this 100 per cent data redundancy provides full protection against the failure of either of the disks containing the duplicated data. Mirroring set-up always require an even number of hard disks for obvious reasons.

Duplexing: Duplexing is an extension of mirroring. Disk duplexing is the simultaneous writing of the same data over two RAID controllers to two separate disks. So if you were doing mirroring on two hard disks, they would both be connected to a single host adapter or RAID controller. If you were doing duplexing, one of the disks would be connected to one adapter and the other to a second adapter.

Striping: This process involves breaking data into small pieces and distributing it across multiple disks. Striping can be done at the bit level, byte level, or in blocks. A data file that might take 4 seconds to write on a single disk can be striped to four separate disks in 1 second.

Parity: It refers to a procedure of error-detection and/or correction. Parity is additional or redundant information appended to a block of data. Parity information for a block of data is typically calculated from the data itself. If one disk in the array fails, the parity information can be used to rebuild the information that is not available due to the failed disk. However, parity technique does not provide protection if multiple hard disks fail. In RAID technology the parity calculation is typically performed using a logical operation called “exclusive OR” or “XOR”. Other commonly used types of parity (not used in RAID) are: (a) Even: The data bits plus the parity bit produce an even number of 1s. (b) Odd: The data bits plus the parity bit produce an odd number of 1’s.

Most RAID levels provide protection for the data stored on the array. The remarkable benefit is that the data on the array can withstand even the complete failure of one hard disk (or sometimes more) without any data loss, and without requiring any data to be restored from backup. RAID technologies have become integrated into network attached storage (NAS) and storage area network (SAN) technologies.
 


Various levels

RAID can handle data storage by using different models of implementations or configurations which are described in terms of “levels.” The levels of RAID make use of physical disks in diverse ways and have different performance, redundancy, storage capacity, reliability, and cost characteristics.

There can also be combinations of RAID levels (Hybrid RAID levels). Different vendors have proposed a number of other RAID levels (Proprietary RAID levels). A fundamental understanding of the different RAID levels is important because each level is optimized for a different use. You must know ability of each level to choose the right one.

RAID 0: Also known as disk striping, RAID Level 0 is the simplest and most affordable RAID level. This level stripes data across two or more disks, without storing any redundant information, resulting in higher data throughput. RAID 0 is supported by all hardware controllers, both SCSI and IDE/ATA, and also most software RAID solutions. It is becoming increasingly popular among performance-seekers, specially in the lower end of the marketplace.

On the plus side, you lose absolutely no space from your volume (full capacity of the installed hard disks can be used because there is no redundancy overhead) and gain maximum performance.

Array capacity is equal to (Size of smallest drive x Number of drives). RAID 0 boosts I/O performance in a wide variety of circumstances by increasing the available I/O bandwidth (data can be simultaneously transferred to and from every disk in the array).

Theoretical bandwidth is increased by a factor of n, where n is the number of drives used. Read/write operations can occur simultaneously. The downside to RAID 0 configurations is that it sacrifices fault tolerance. Striped arrays in RAID 0 provide no protection against data loss. If one disk of the array fails for any reason, all data within the entire array will be lost.

RAID 1: This creates a pair of mirrored drives with exactly the same data. RAID 1 is usually implemented as mirroring; a drive has its data duplicated on two different drives. A variant of RAID 1 is duplexing, which duplicates the controller card as well as the drive, providing tolerance against failures of either a drive or a controller. The result of RAID 1 is maximum redundancy, with minimal efficiency of space. It is supported by all hardware controllers, both SCSI and IDE/ATA, and also most software RAID solutions. For highest performance, the controller must be able to perform two concurrent separate reads per mirrored pair or two duplicate writes per mirrored pair.

A two-disk RAID 1 array’s storage capacity is limited to the capacity of the smallest-capacity drive. As the data is always the same on the two drives, read performance is almost doubled (if the controller allows simultaneous reads). Write performance is not improved.

RAID 2: Also referred to as disk striping with Error Checking and Correcting (ECC) or Hamming Code ECC (named after Richard Wesley Hamming of Bell Labs who first utilized the method of implementing ECC memory using the theoretical minimum number of redundant bits).

RAID 2 interleaves data at the bit level across the drives in an array and error checking information is calculated and written to a specially designated disk(s). It is intended for use with drives which do not have built-in error detection. However, hard disks today integrate strong error checking/correction functions and RAID 2 is rarely implemented (pretty much obsolete).

RAID 2 has slow I/O and is remarkably unreliable (since any of the multiple checksum disks can fail). It is expensive and often requires many drives (a typical setup required 10 data disks and 4 ECC disks for a total of 14, or 32 data disks and 7 ECC disks for a total of 39).

RAID 3: Also known as disk striping with parity or parallel transfer with parity. This is the same as level 2, except that a single parity bit is written to a parity drive instead of checksums to checksum drives. When one of the data drives fails, the parity drive is used to rebuild the missing data.

RAID 3 is more reliable than level 2 because there is only one parity drive that can fail. The dedicated parity disk does generally serve as a performance bottleneck, especially for random writes, because it must be accessed any time anything is sent to the array.

RAID 4: Referred to as disk striping with large stripes, it is very similar to RAID 3. Parity information is stored on a single disk. The only difference is that it uses block level striping instead of byte level striping.

Block level striping improves random access performance compared to RAID 3, but the dedicated parity disk remains a bottleneck, specially for random write performance.

RAID 5: In this level both the data and parity information are striped across all disks in stripes to provide full data recoverability in case any single drive fails. This level is extremely popular and widely used in enterprises of all sizes.

Distributed parity in RAID 5 removes the bottleneck of the dedicated parity drive. The implementation enables all drives to read simultaneously, maximizing read performance. Array capacity is [(Size of smallest drive) x (number of drives - 1)]. RAID 5 is considered as best combination of performance, redundancy, and storage efficiency.
 


Securing data

It about time, you invest time and money in securing your data. Although, you can use CD-R/RW for backup, RAID can still be considered as a much better solution, because you don’t need to worry about when the last backup was made, by choosing any level from RAID 0, RAID 1 and RAID 5. For most PC users RAID 1 (mirroring) is excellent. If you are a serious computer user or your data is important for you and you need better I/O performance, you should consider RAID 10. It is a hybrid level (RAID 0 + RAID 1), which gives you the benefits of striping and mirroring. It is somewhat costly to implement because every disk in the stripe set must be mirrored with a second disk.

The writer is a young scholar of electrical engineering at University of Engineering & Technology, Lahore



Click to learn more...
Please Visit our Sponsor (Ads open in separate window)

Previous Story Top of Page Next Story

Seprater
Contributions
Privacy Policy
© DAWN Group of Newspapers, 2005