Network data storage devices – modern RAID arrays similarities, differences and convergence
What is the similarities, differences and convergence of modern network data storage devices?
Data storage solutions are required to store the data electronically and making it machine readable. Storage is a process through which digital data is saved within a data storage device my means of computing technology. Storage is a mechanism that enables a computer to retain data, either temporarily or permanently.
Storage devices such as flash drives and hard disks are a fundamental component of most digital devices since they allow users to preserve all kind of information such as documents, videos, pictures and raw data.
Some common storage devices:
- Hard disk (including Solid State drives);
- Flash drives and Memory cards;
- Floppy disks;
- Tape drives;
- CD\DVD disks;
- Magneto-Optical disks.
The most widespread standard for configuring multiple hard disks drives is RAID (Redundant Array of Inexpensive Disks), which comes in a number of standard configurations and non-standard (nested or hybrid) configurations. There is several different way to organize hard drive arrays:
Standard RAID levels
The standard RAID levels comprise a basic set of RAID configurations that employ the techniques of striping, mirroring, or parity to create large reliable data stores from multiple general-purpose computer hard disk drives.
- RAID 0 (stripe set without parity, redundancy of fault tolerance)
- RAID 1 (mirroring)
- RAID 2 (rarely used, stripes data at the bit, rather than block, level)
- RAID 3 (rarely used, byte-level striping with dedicated parity disk)
- RAID 4 (block level striping with a dedicated parity disk)
- RAID 5 (distributed parity)
- RAID 6 (dual parity)
Nested RAID levels
Nested RAID levels are usually numbered using a series of numbers, where the most commonly used levels use two numbers. The first number in the numeric designation denotes the lowest RAID level in the "stack", while the rightmost one denotes the highest layered RAID level.
- RAID 01 (RAID 0+1) (a mirror of stripes)
- RAID 03 (RAID 0+3) (byte-level striping with dedicated parity)
- RAID 10 (RAID 1+0) (a stripe of mirrors)
- RAID 50 (RAID 5+0) (block level striping of RAID0 with the distributed parity of RAID5)
- RAID 60 (RAID 6+0) (block level striping of RAID0 with the distributed double parity of RAID6)
- RAID 100 (RAID 10+0) (a stripe of RAID 10s)
Enterprise redundant RAID configuration will often allow to hot swap drives. Hot swappable drives enable IT engineers to remove a failed drive and replace it with a spare using specialized drive enclosures without having to shut down the system. This is useful in environments where the drive array serves a mission-critical purpose and downtime is not an options.
Various RAID configurations are used in Direct Attached storages (DAS), Network Attached Storage (NAS) and Storage Area Networks (SAN) architectures.
Direct-attached storage (DAS)
Direct-attached storage (DAS) is digital storage directly attached to the computer\server accessing it, as opposed to storage accessed over a computer network (i.e. network-attached storage). DAS was the precursor to NAS. DAS performs better for software programs that require more computing. However, being directly attached, it does not lend itself well to sharing and is complex to manage.
Network-attached storage (NAS)
Network-attached storage (NAS) is a file-level (as opposed to block-level storage) computer data storage server connected to a computer network providing data access to a heterogeneous group of clients. NAS are accessible over a network using an Ethernet connection and file protocols like SMB/CIFS (Server Message Block/Common Internet File System) or NFS (Network File System).
NAS is popular way of creating network file shares within an organization, where authorized personnel often collaborate on the same files or other forms of business information. It can also be used to keep a backup of files in case a local drive gives out and consolidate multimedia libraries, among other use cases where storing and trading files over a local network comes in handy.
Typically, the more high-end the NAS system, the more RAID configuration options are available. High end systems for larger organizations from the likes of Dell EMC, HPE, and NetApp offer a plethora of RAID options that storage administrators can use to meet their file storage capacity, performance and data protection requirements. NAS appliances can utilize RAID technologies, they work well together or completely apart in many cases. Home and enterprise users can create a RAID configurations unless they choose a JBOD (just a bunch of disks) mode.
Performance wise, components that was used to build NAS will define overall performance:
- CPU: budget NAS devices will have low-end processors while enterprise NAS systems are often powered by server grade processors like Intel's line of Xeon CPUs
- RAM: low-end NAS devices can get by with meager amounts of RAM, while high-end systems can offer gigabytes worth of memory to cache up file operations.
- Drives: opting for enterprise NAS grade derives will ensure they deliver reliably fast performance with better read/write speeds and better throughput rates. For the ultimate in performance, some vendors like Dell EMC or Synology offer all-flash NAS arrays outfitted with fast SSDs (solid-state drives).
In a RAID configuration, performance characteristics are governed by the quality and type of hard drives used, type of RAID controller and the RAID level selected. A RAID6 implementation will deliver good read speed while write speeds suffer somewhat because the RAID array needs to store and manage parity information to provide fault tolerance, for example.
Storage Area Network (SAN)
A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from servers so that the devices appear to the operating system as direct-attached storage. A SAN typically is a dedicated network of storage devices not accessible through the local area network (LAN). SANs have their own networking devices, such as SAN switches. To access the SAN, so-called SAN servers are used, which in turn connect to SAN host adapters. Within the SAN, a range of data storage devices may be interconnected, such as SAN-capable disk arrays and tape libraries.
SAN architecture include several layers:
- Host layer (Servers that allow access to the SAN and its storage devices are said to form the host layer of the SAN.)
- Fabric layer (The fabric layer consists of SAN networking devices that include SAN switches, routers, protocol bridges, gateway devices, and cables.)
- Storage layer (The serialized Small Computer Systems Interface (SCSI) protocol is often used on top of the Fiber Channel switched fabric protocol in servers and SAN storage devices. The Internet Small Computer Systems Interface (iSCSI) over Ethernet and the Infiniband protocols may also be found implemented in SANs, but are often bridged into the Fiber Channel SAN. However, Infiniband and iSCSI storage devices, in particular, disk arrays, are available.)
Storage networks may also be built using Serial Attached SCSI (SAS) and Serial ATA (SATA) technologies. SAS evolved from SCSI direct-attached storage. SATA evolved from Parallel ATA direct-attached storage. SAS and SATA devices can be networked using SAS Expanders.
The Storage Networking Industry Association (SNIA) defines a SAN as "a network whose primary purpose is the transfer of data between computer systems and storage elements". But a SAN does not just consist of a communication infrastructure, it also has a software management layer. This software organizes the servers, storage devices, and the network so that data can be transferred and stored. Because a SAN does not use direct attached storage (DAS), the storage devices in the SAN are not owned and managed by a server.
SAN management software is installed on one or more servers and management clients on the storage devices. Two approaches have developed in SAN management software: in-band and out-of band management. In-band means that management data between server and storage devices is transmitted on the same network as the storage data. While out-of-band means that management data is transmitted over dedicated links.
Although a SAN provides only block-level access, file systems built on top of SANs do provide file-level access and are known as shared-disk file systems.
SAN Storage QoS enables the desired storage performance to be calculated and maintained for network customers accessing the device. Some factors that affect SAN QoS are:
- Bandwidth – The rate of data throughput available on the system.
- Latency – The time delay for a read/write operation to execute.
- Queue depth – The number of outstanding operations waiting to execute to the underlying disks (traditional or solid-state drives).
Over-provisioning can be used in contrast with QoS to provide additional capacity to compensate for peak network traffic loads. However, where network loads are not predictable, over-provisioning can eventually cause all bandwidth to be fully consumed and latency to increase significantly resulting in SAN performance degradation.
How NAS and SAN compete to each other
NAS provides both storage and a file system. This is often contrasted with SAN (storage area network), which provides only block-based storage and leaves file system concerns on the "client" side. SAN protocols include Fibre Channel, iSCSI, ATA over Ethernet (AoE) and HyperSCSI.
One way to loosely conceptualize the difference between a NAS and a SAN is that NAS appears to the client OS (operating system) as a file server (the client can map network drives to shares on that server) whereas a disk available through a SAN still appears to the client OS as a disk, visible in disk and volume management utilities (along with client's local disks), and available to be formatted with a file system and mounted.
Despite their differences, SAN and NAS are not mutually exclusive and may be combined as a SAN-NAS hybrid, offering both file-level protocols (NAS) and block-level protocols (SAN) from the same system.
The Main Differences Between NAS and SAN
- Transfer medium. NAS uses TCP/IP networks, most commonly Ethernet. Traditional SANs typically run on high speed Fibre Channel networks, although more SANs are adopting IP-based fabric because of FC’s expense and complexity. High performance remains a SAN requirement and flash-based fabric protocols are helping to close the gap between FC speeds and slower IP.
- Data processing. The two storage architectures process data differently: NAS processes file-based data and SAN processes block data. The story is not quite as straightforward as that of course: NAS may operate with a global namespace, and SANs have access to a specialized SAN file system. A global namespace aggregates multiple NAS file systems to present a consolidated view. SAN file systems enable servers to share files. Within the SAN architecture, each server maintains a dedicated, non-shared LUN (logical unit number). SAN file systems allow servers to safely share data by providing file-level access to servers on the same LUN (logical unit number).
- Protocols. NAS connects directly to an Ethernet network via a cable into an Ethernet switch. NAS can use several protocols to connect with servers including NFS, SMB/CIFS, and HTTP. On the SAN side, servers communicate with SAN disk drive devices using the SCSI protocol. The network is formed using SAS/SATA fabrics, or mapping layers to other protocols such as Fiber Channel Protocol (FCP) that maps SCSI over Fiber Channel, or iSCSI that maps SCSI over TCP/IP.
- Performance. SANs are the higher performers for environments that need high-speed traffic such as high transaction databases and ecommerce websites. NAS generally has lower throughput and higher latency because of its slower file system layer, but high-speed networks can make up for performance losses within NAS.
- Scalability. Entry level NAS devices are not highly scalable, but high-end NAS systems scale to petabytes using clusters or scale-out nodes. In contrast, scalability is a major driver for purchasing a SAN. Its network architecture enables admins to scale performance and capacity in scale-up or scale-out configurations.
- Price. Although a high-end NAS will cost more than an entry-level SAN, in general NAS is less expensive to purchase and maintain. NAS devices are considered appliances and have fewer hardware and software management components than a storage area network. Administrative costs also figure into the equation. SANs are more complex to manage with FC SANs on top of the complexity heap. A rule of thumb is to figure 10 to 20 times the purchase cost as an annual maintenance calculation.
- Ease of management. In a one-to-one comparison, NAS wins the ease of management contest. The device easily plugs into the LAN and offers a simplified management interface. SANs require more administration time than the NAS device. Deployment often requires making physical changes to the data center, and ongoing management typically requires specialized admins. The exception to the SAN-is-harder argument is multiple NAS devices that do not share a common management console.
What is NAS storage used for?
- File storage and sharing. This is NAS major use case in mid-sized, SMB, and enterprise remote offices. A single NAS device allows IT to consolidate multiple file servers for simplicity, ease of management, and space and energy savings.
- Active archives. Long-term archives are best stored on less expensive storage like tape or cloud-based cold storage. NAS is a good choice for searchable and accessible active archives, and high capacity NAS can replace large tape libraries for archives.
- Big data. Businesses have several choices for big data: scale-out NAS, distributed JBOD nodes, all-flash arrays, and object-based storage. Scale-out NAS is good for processing large files, ETL (extract, transform, load), intelligent data services like automated tiering, and analytics. NAS is also a good choice for large unstructured data such as video surveillance and streaming, and post-production storage.
- Virtualization. Not everyone is sold on using NAS for virtualization networks, but the usage case is growing and VMware and Hyper-V both support their datastores on NAS. This is a popular choice for new or small virtualization environments when the business does not already own a SAN.
- Virtual desktop interface (VDI). Mid-range and high-end NAS systems offer native data management features that support VDI such as fast desktop cloning and data deduplication.
Benefits of SAN - accelerate, scale, and protect.
- Databases and ecommerce websites. General file serving or NAS will do for smaller databases, but high-speed transactional environments need the SAN’s high I/O processing speeds and very low latency. This makes SANs a good fit for enterprise databases and high traffic ecommerce websites.
- Fast backup. Server operating systems view the SAN as attached storage, which enables fast backup to the SAN. Backup traffic does not travel over the LAN since the server is backing up directly to the SAN. This makes for faster backup without increasing the load on the Ethernet network.
- Virtualization. NAS supports virtualized environments, but SANs are better suited to large-scale and/or high-performance deployments. The storage area network quickly transfers multiple I/O streams between VMs and the virtualization host, and high scalability enables dynamic processing.
- Video editing. Video editing applications need very low latency and very high data transfer rates. SANs provide this high performance because it cables directly to the video editing desktop client, dispensing with an extra server layer. Video editing environments need a third-party SAN distributed file system and per-node load balancing control.
SAN and NAS Convergence
Unified (or multi-protocol) SAN/NAS combines file and block storage into a single storage system. These unified systems support up to four protocols. The storage controllers allocate physical storage for NAS or SAN processing.
They are popular for mid-range enterprises who need both SAN and NAS, but lack data center space and specialized admins for separate systems. Converged SAN/NAS are a much smaller part of the market than distinct deployments but show steady growth.