Saturday, October 15, 2011

Which File System Blocksize is suitable for my system?

Taken from IBM Developer Network "File System Blocksize"

Although the article has referenced to General Parallel File System (GPFS), but there are many good pointers System Administrators can take note of.

Here are some excerpts from the article........ 

This is one question that many system administrator asked before we start preparing the system. How do choose a blocksize for your file system? IBM Developer Network (File System Blocksize) recommends the following block size for various type of application.


IO Type Application Examples Blocksize
Large Sequential IO Scientific Computing, Digital Media 1MB to 4MB
Relational Database DB2, Oracle 512kb
Small I/O Sequential General File Service, File based Analytics,Email, Web Applications 256kb
Special* Special 16KB-64KB

What if I do not know my application IO profile?
Often you do not have good information on the nature of the IO profile or the applications are so diverse it is difficult to optimize for one or the other. There are generally two approaches to designing for this type of situation separation or compromise.

Separation
In this model you create two file systems, one with a large file system blocksize for sequential applications and one with a smaller block size for small file applications. You can gain benefits from having file systems of two different block sizes even on a single type of storage. Or you can use different types of storage for each file system to further optimize to the workload. In either case the idea is that you provide two file systems to your end users, for scratch space on a compute cluster for example. Then the end users can run tests themselves by pointing the application to one file system or another to and determining by direct testing which is best for their workload. In this situation you may have one file system optimized for sequential IO with a 1MB blocksize and one for more random workloads at 256KB block size.

Compromise
In this situation you either do not have sufficient information on workloads (i.e. end users won't think about IO performance) or enough storage for multiple file systems. In this case it is generally recommended to go with a blocksize of 256KB or 512KB depending on the general workloads and storage model used. With a 256KB block size you will still get good sequential performance (though not necessarily peak marketing numbers) and you will get good performance and space utilization with small files (256KB has minimum allocation of 8KB to a file). This is a good configuration for multi-purpose research workloads where the application developers are focusing on their algorithms more than IO optimization.

No comments: