2.2. Planning Node Hardware Configurations

2.2. Planning Node Hardware Configurations

Acronis Storage works on top of commodity hardware, so you can create a cluster from regular servers, disks, and network cards. Still, to achieve the optimal performance, a number of requirements must be met and a number of recommendations should be followed.

2.2.1. Hardware Requirements

The following table lists the minimal and recommended hardware for a single node in the cluster:

Type Minimal Recommended
CPU Dual-core CPU Intel Xeon E5-2620V2 or faster; at least one CPU core per 8 HDDs
RAM 4GB 16GB ECC or more, plus 0.5GB ECC per each HDD
Storage

System: 100GB SATA HDD

Metadata: 100GB SATA HDD (on the first five nodes in the cluster)

Storage: 100GB SATA HDD

System: 250GB SATA HDD

Metadata+Cache: One or more recommended enterprise-grade SSDs with power loss protection; 100GB or more capacity; and 75 MB/s sequential write performance per serviced HDD. For example, a node with 10 HDDs will need an SSD with at least 750 MB/s sequential write speed (on the first five nodes in the cluster)

Storage: Four or more HDDs or SSDs; 1 DWPD endurance minimum, 10 DWPD recommended

Disk controller None HBA or RAID
Network 1Gbps or faster network interface Two 10Gbps network interfaces; dedicated links for internal and public networks
Sample configuration   Intel Xeon E5-2620V2, 32GB, 2xST1000NM0033, 32xST6000NM0024, 2xMegaRAID SAS 9271/9201, Intel X540-T2, Intel P3700 800GB

2.2.2. Hardware Recommendations

The following recommendations explain the benefits added by specific hardware in the hardware requirements table and are meant to help you configure the cluster hardware in an optimal way:

2.2.2.1. Cluster Composition Recommendations

Designing an efficient cluster means finding a compromise between performance and cost that suits your purposes. When planning, keep in mind that a cluster with many nodes and few disks per node offers higher performance while a cluster with the minimal number of nodes (5) and a lot of disks per node is cheaper. See the following table for more details.

Design considerations Minimum nodes (5), many disks per node Many nodes, few disks per node
Optimization Lower cost. Higher performance.
Free disk space to reserve More space to reserve for cluster rebuilding as fewer healthy nodes will have to store the data from a failed node. Less space to reserve for cluster rebuilding as more healthy nodes will have to store the data from a failed node.
Redundancy Fewer erasure coding choices. More erasure coding choices.
Cluster balance and rebuilding performance Worse balance and slower rebuilding. Better balance and faster rebuilding.
Network capacity More network bandwidth required to maintain cluster performance during rebuilding. Less network bandwidth required to maintain cluster performance during rebuilding.
Favorable data type Cold data (e.g., backups). Hot data (e.g., virtual environments).
Sample server configuration Supermicro SSG-6047R-E1R36L (Intel Xeon E5-2620 v4 CPU, 32GB RAM, 36 x 12TB HDDs, 1 x 500GB system disk). Supermicro SYS-2028TP-HC0R-SIOM (4 x Intel E5-2620 v4 CPUs, 4 x 16GB RAM, 24 x 1.9TB Samsung SM863a SSDs).

Note

  1. These considerations only apply if failure domain is host.
  2. The speed of rebuilding in the replication mode does not depend on the number of nodes in the cluster.
  3. Acronis Storage supports hundreds of disks per node. If you plan to use more than 36 disks per node, contact sales engineers who will help you design a more efficient cluster.

2.2.2.2. General Hardware Recommendations

  • At least five nodes are required for a production environment. This is to ensure that the cluster can survive failure of two nodes without data loss.
  • One of the strongest features of Acronis Storage is scalability. The bigger the cluster, the better Acronis Storage performs. It is recommended to create production clusters from at least ten nodes for improved resiliency, performance, and fault tolerance in production scenarios.
  • Even though a cluster can be created on top of varied hardware, using nodes with similar hardware in each node will yield better cluster performance, capacity, and overall balance.
  • Any cluster infrastructure must be tested extensively before it is deployed to production. Such common points of failure as SSD drives and network adapter bonds must always be thoroughly verified.
  • It is not recommended for production to run Acronis Storage in virtual machines or on top of SAN/NAS hardware that has its own redundancy mechanisms. Doing so may negatively affect performance and data availability.
  • To achieve best performance, keep at least 20% of cluster capacity free.
  • During disaster recovery, Acronis Storage may need additional disk space for replication. Make sure to reserve at least as much space as any single storage node has.
  • If you plan to use Acronis Backup Gateway to store backups in the cloud, make sure the local storage cluster has plenty of logical space for staging (keeping backups locally before sending them to the cloud). For example, if you perform backups daily, provide enough space for at least 1.5 days’ worth of backups. For more details, see the Administrator’s Guide.

2.2.2.3. Storage Hardware Recommendations

  • It is possible to use disks of different size in the same cluster. However, keep in mind that, given the same IOPS, smaller disks will offer higher performance per terabyte of data compared to bigger disks. It is recommended to group disks with the same IOPS per terabyte in the same tier.
  • Using the recommended SSD models may help you avoid loss of data. Not all SSD drives can withstand enterprise workloads and may break down in the first months of operation, resulting in TCO spikes.
    • SSD memory cells can withstand a limited number of rewrites. An SSD drive should be viewed as a consumable that you will need to replace after a certain time. Consumer-grade SSD drives can withstand a very low number of rewrites (so low, in fact, that these numbers are not shown in their technical specifications). SSD drives intended for Acronis Storage clusters must offer at least 1 DWPD endurance (10 DWPD is recommended). The higher the endurance, the less often SSDs will need to be replaced, improving TCO.
    • Many consumer-grade SSD drives can ignore disk flushes and falsely report to operating systems that data was written while it in fact was not. Examples of such drives include OCZ Vertex 3, Intel 520, Intel X25-E, and Intel X-25-M G2. These drives are known to be unsafe in terms of data commits, they should not be used with databases, and they may easily corrupt the file system in case of a power failure. For these reasons, use to enterprise-grade SSD drives that obey the flush rules (for more information, see the PostgreSQL documentation). Enterprise-grade SSD drives that operate correctly usually have the power loss protection property in their technical specification. Some of the market names for this technology are Enhanced Power Loss Data Protection (Intel), Cache Power Protection (Samsung), Power-Failure Support (Kingston), Complete Power Fail Protection (OCZ).
    • Consumer-grade SSD drives usually have unstable performance and are not suited to withstand sustainable enterprise workloads. For this reason, pay attention to sustainable load tests when choosing SSDs. We recommend the following enterprise-grade SSD drives which are the best in terms of performance, endurance, and investments: Intel S3710, Intel P3700, Huawei ES3000 V2, Samsung SM1635, and Sandisk Lightning.
  • The use of SSDs for write caching improves random I/O performance and is highly recommended for all workloads with heavy random access (e.g., iSCSI volumes).
  • Running metadata services on SSDs improves cluster performance. To also minimize CAPEX, the same SSDs can be used for write caching.
  • If capacity is the main goal and you need to store non-frequently accessed data, choose SATA disks over SAS ones. If performance is the main goal, choose SAS disks over SATA ones.
  • The more disks per node the lower the CAPEX. As an example, a cluster created from ten nodes with two disks in each will be less expensive than a cluster created from twenty nodes with one disk in each.
  • Using SATA HDDs with one SSD for caching is more cost effective than using only SAS HDDs without such an SSD.
  • Use HBA controllers as they are less expensive and easier to manage than RAID controllers.
  • Disable all RAID controller caches for SSD drives. Modern SSDs have good performance that can be reduced by a RAID controller’s write and read cache. It is recommend to disable caching for SSD drives and leave it enabled only for HDD drives.
  • If you use RAID controllers, do not create RAID volumes from HDDs intended for storage (you can still do so for system disks). Each storage HDD needs to be recognized by Acronis Storage as a separate device.
  • If you use RAID controllers with caching, equip them with backup battery units (BBUs) to protect against cache loss during power outages.

2.2.2.4. Network Hardware Recommendations

  • Use separate networks (and, ideally albeit optionally, separate network adapters) for internal and public traffic. Doing so will prevent public traffic from affecting cluster I/O performance and also prevent possible denial-of-service attacks from the outside.
  • Network latency dramatically reduces cluster performance. Use quality network equipment with low latency links. Do not use consumer-grade network switches.
  • Do not use desktop network adapters like Intel EXPI9301CTBLK or Realtek 8129 as they are not designed for heavy load and may not support full-duplex links. Also use non-blocking Ethernet switches.
  • To avoid intrusions, Acronis Storage should be on a dedicated internal network inaccessible from outside.
  • Use one 1 Gbit/s link per each two HDDs on the node (rounded up). For one or two HDDs on a node, two bonded network interfaces are still recommended for high network availability. The reason for this recommendation is that 1 Gbit/s Ethernet networks can deliver 110-120 MB/s of throughput, which is close to sequential I/O performance of a single disk. Since several disks on a server can deliver higher throughput than a single 1 Gbit/s Ethernet link, networking may become a bottleneck.
  • For maximum sequential I/O performance, use one 1Gbit/s link per each hard drive, or one 10Gbit/s link per node. Even though I/O operations are most often random in real-life scenarios, sequential I/O is important in backup scenarios.
  • For maximum overall performance, use one 10 Gbit/s link per node (or two bonded for high network availability).
  • It is not recommended to configure 1 Gbit/s network adapters to use non-default MTUs (e.g., 9000-byte jumbo frames). Such settings require additional configuration of switches and often lead to human error. 10 Gbit/s network adapters, on the other hand, need to be configured to use jumbo frames to achieve full performance.

2.2.3. Hardware and Software Limitations

Hardware limitations:

  • Each physical server must have at least two disks with the assigned three roles: System, Metadata, and Storage. The System role can be combined with the Metadata or Storage role, if the system disk capacity is greater than 100GB.

    Note

    1. It is recommended to assign the System+Metadata role to an SSD. Assigning both these roles to an HDD will result in mediocre performance suitable only for cold data (e.g., archiving).
    2. The System role cannot be combined with the Cache and Metadata+Cache roles. The reason is that is I/O generated by the operating system and applications would contend with I/O generated by journaling, negating its performance benefits.
  • Five servers are required to test all the features of the product.

  • The system disk must have at least 100 GBs of space.

Software limitations:

  • The maintenance mode is not supported. Use SSH to shut down or reboot a node.
  • One node can be a part of only one cluster.
  • Only one S3 cluster can be created on top of a storage cluster.
  • Only predefined redundancy modes are available in the management panel.
  • Thin provisioning is always enabled for all data and cannot be configured otherwise.

Note

For network limitations, see Network Limitations.

2.2.4. Minimum Configuration

The minimum configuration described in the table will let you evaluate Acronis Storage features:

Node # 1st disk role 2nd disk role 3rd and other disk roles Access points
1 System Metadata Storage iSCSI, Object Storage private, S3 public, NFS, ABGW
2 System Metadata Storage iSCSI, Object Storage private, S3 public, NFS, ABGW
3 System Metadata Storage iSCSI, Object Storage private, S3 public, NFS, ABGW
4 System Metadata Storage iSCSI, Object Storage private, ABGW
5 System Metadata Storage iSCSI, Object Storage private, ABGW
5 nodes in total   5 MDSs in total 5 or more CSs in total Access point services run on five nodes in total

Note

SSD disks can be assigned metadata and cache roles at the same time, freeing up one more disk for the storage role.

Even though five nodes are recommended even for the minimal configuration, you can start evaluating Acronis Storage with just one node and add more nodes later. At the very least, a storage cluster must have one metadata service and one chunk service running. However, such a configuration will have two key limitations:

  1. Just one MDS will be a single point of failure. If it fails, the entire cluster will stop working.
  2. Just one CS will be able to store just one chunk replica. If it fails, the data will be lost.

Important

If you deploy Acronis Storage on a single node, you must take care of making its storage persistent and redundant to avoid data loss. If the node is physical, it must have multiple disks so you can replicate the data among them. If the node is a virtual machine, make sure that this VM is made highly available by the solution it runs on.

Acronis Backup Gateway works with the local object storage in the staging mode. It means that the data to be replicated, migrated, or uploaded to a public cloud is first stored locally and only then sent to the destination. It is vital that the local object storage is persistent and redundant so the local data does not get lost. There are multiple ways to ensure the persistene and redundancy of the local storage. You can deploy your Acronis Backup Gateway on multiple nodes and select a good redundancy mode. If your gateway is deployed on a single node in Acronis Storage, you can make its storage redundant by replicating it among multiple local disks. If your entire Acronis Storage installation is deployed in a single virtual machine with the sole purpose of creating a gateway, make sure this VM is made highly available by the solution it runs on.

2.2.6. Raw Disk Space Considerations

When planning the Acronis Storage infrastructure, keep in mind the following to avoid confusion:

  • The capacity of HDD and SSD is measured and specified with decimal, not binary prefixes, so “TB” in disk specifications usually means “terabyte”. The operating system, however, displays drive capacity using binary prefixes meaning that “TB” is “tebibyte” which is a noticeably larger number. As a result, disks may show capacity smaller than the one marketed by the vendor. For example, a disk with 6TB in specifications may be shown to have 5.45 TB of actual disk space in Acronis Storage.
  • Acronis Storage reserves 5% of disk space for emergency needs.

Therefore, if you add a 6TB disk to a cluster, the available physical space should increase by about 5.2 TB.