• ENQUIRY / TOLL FREE INDIA - 1800-200-8001 | COMPANY | CONTACT | LOGIN   

Automatic Policy Based Purging In SSD Cache of StorTrends

Hard Disk Drive, a generally used secondary storage medium, imposes latency in data storage and retrieval due to its mechanical nature. To overcome the latency to some extent, a component called Cache which is usually of smaller size but on a faster storage medium, is used to store data so that future requests can be served faster. On the other hand, Solid State Drives, a non-volatile flash memory, has much lesser latency compared to Hard Disk Drives but at a higher cost. The idea is to make use of one or more Solid State Drives (SSD) to form a cache component which is relatively faster as well as can hold large Gigabytes of hot data.

StorTrends iTX has SSD Cache present for each Storage Pool. Every Storage Pool comprises of different number of logical drives which are RAID comprising physical drives. Having an SSD Cache component over the mechanical based RAID drives helps the Storage Stack to give good performance for data reads and writes especially, when the IO pattern is Random. The SSD Cache component not only caches the IOs in it but also continuously purges/flushes the dirty data to its physical location present at the logical drives. This mechanism works in parallel with the ongoing IO to ensure that there is free blocks to hold data that is new IOs are continuously getting stored.

A System Administrator creates a Storage Pool using SSD Cache which are linked through iSCSI from the Windows or Linux system via iSCSI initiator. While creating the volumes the user has an option to select various profiles for using SSD Cache, such as:

  • Accelerate All IOs
  • Accelerate Random IOs
  • No SSD Cache Acceleration

The SSD Cache module does purging as per its data block granularity. In StorTrends iTX 2.8 v2.14, SSD Cache has four threads by default. These four threads were set based on a random calculation resulting on SSD Cache to continuously fire 512 IOPS from each thread while purging. This results in a total number of 2000 IOPS from SSD Cache. This purging is done to the logical drives which are RAID based on mechanical drives. In StorTrends, the amount of total IOPS from SSD Cache will make use of maximum disk queue length for the HDDs involved in the RAID. Hence, similar to SSD Cache’s continuous performance of data, the purge method flushes the dirty data parallel to the logical drive to make way for more blocks in SSD Cache for new data.


Problem

In a system with a mix of acceleration profiles, the Accelerate-All-IOs profile uses both the read cache and write cache for all IOs. However, volumes configured with Random-Only profile or No-SSD-Cache profile are either hitting the SSD Cache or hitting the logical drive directly skipping the SSD Cache. When a very high amount of write outstanding that is more than 256 outstanding IOs with a granularity of 64KB is going through SSD Cache the purge is totally stopped. This is to ensure steady responsiveness to the application IOs. The SSD cache module waits for outstanding IO to go below the said level. A continuous high IO load can, thus, result in the SSD cache becoming full without any purge activity thereby making the SSD cache not usable further.

Alternately, when the SSD cache is not receiving a high load, the SSD cache purge is initiated using four threads to maximise the disk queue length at logical drive level. This will impact the performance of the volumes whose IOs are hitting the logical drive directly. A sequential IO stream on a Random-only profile volume or any type IO load on a No-SSD-Cache profile volume can get affected by the active purge from the SSD cache. The north side IOs (IO from host/initiator) going to the logical drive waits for its turn at the disk queue. If there is a delay in completing the IO then the IO latency increases heavily and the performance of the system dips drastically.

Thus, irrespective of whether the IO load through the SSD cache is low or high, the purging or lack of it is problematic.

Tiered Storage Architecture

Figure 1: Top Level View of StorTrends iTX operating with SSD Cache

Below is graph showing how based on outstanding IO the performance of a non-SSD cached volume varies caused by the effect of purging of other SSD cached volumes

Tiered Storage Architecture

Figure 2: Performance of one non-SSD Cached volume

Solution

           As a solution, in StorTrends 3500i SSD Cache defines the terms high, medium and low outstanding to discrete numbers to create a solid policy.

           The main solution to the problem is that the SSD Cache has to make sure it keeps purging on and avoid SSD Cache from becoming full and not cancel the purge and push it to a high outstanding IOs condition. Also in case of medium outstanding IO, it should not use all threads for purging so that north side IO for non-SSD cached volumes so that it gets a consistent performance. This can be achieved when purge is run with different number of threads based on the outstanding IO and this solution is implemented in StorTrends iTX 2.8v3.0. The SSD cache purge intelligence can start/stop threads based on outstanding IO and dirty % during runtime also.

           So, when outstanding IO is high and dirty is greater than 75%, it makes sure 1 thread runs continuously to purge the dirty, so that the SSD Cache volumes keeps hitting the SSD Cache. And when outstanding is low, it uses all threads to purge. When the Outstanding is medium, it checks the dirty % and decides the number of threads to run with.

           There are some special cases when this policy is not followed. This could be when the SSD Cache is in a planned or unplanned degraded state i.e. one drive is removed from SSD Cache and needs replacement. In this situation, it has to quickly purge the dirty data before the other drive also fails (which may have high likelihood because both the drives were put to use at same time and hence may have the same lifetime). In this case the purge runs with all four threads.

           Below is the graph showing performance of a non-SSD cached volume based on number of threads and outstanding IO at SSD Cached volume in StorTrends iTX.

Tiered Storage Architecture

Figure 3: Performance of one non-SSD Cached volume depending on different outstanding IO

Tiered Storage Architecture

Figure 4: Performance of a volume with varying Outstanding IO as per SSD Cache Purge Policy Intelligence.

Conclusion

In StorTrends 3500i storage servers running StorTrends iTX 2.8v3.0, SSD Cache policy based purging helps to keep the performance of both SSD Cached volumes and non-SSD Cached volumes consistent. This also ensures that SSD Cached volumes never starve for blocks in SSD Cache.