NOTE: The content of this article has been published in the official Exchange 2007 documentation. We recommend that you check the documentation for the most up-to-date version. Please go here:
http://technet.microsoft.com/en-us/library/bb124518(EXCHG.80).aspx
Edit: this post has been updated on 7/3/07, updates from previous versions are listed here.
Introduction
This is the final blog in my series about Exchange 2007 storage. In this blog I will tie together all of the content from previous blogs to outline our recommendations for configuring, validating, and monitoring your Exchange storage solution.
There are four key objectives to this blog.
- Understand what information you need to correctly design a storage solution for Exchange 2007.
- Apply hardware and technology to these storage designs.
- Validate the storage design.
- Monitor the storage design.
Capacity and performance are often at odds with each other when it comes to physical disks, and both must be considered before making a purchasing decision. The first concern is whether there will be enough space to store all of the data. The second concern is that the transactional I/O must be measured or predicted to ensure that the solution also meets the performance requirements for acceptable disk latency and a responsive end user experience. The third concern, is to ensure that the non transactional I/O has both enough time to complete, and enough disk performance and throughput to meet the required service level agreement (SLA). The Holy Grail is to take these three parts and find a balance in the design of the actual hardware to meet all four objectives.
Capacity
Having enough capacity is absolutely critical. When a database LUN runs out of space, the databases on that LUN will dismount. When a transaction log LUN runs out of space, it will cause all of the databases in that storage group to dismount. Provisioning additional space is often hard to do quickly, and performing an offline compaction could take a long time. In most cases running out of disk space will result in an interruption of availability for one or more databases for a period of time that typically exceeds most recovery time objectives (RTO).
Mailbox Size\Mailbox Count
The first metric to understand is mailbox size. The amount of data an end user is allowed to store in their mailbox will help in determining how many users can be housed on the server. While final mailbox sizes and quotas can and do change, having a goal in mind is the first step in determining your needed capacity. For example, if you have 4000 users on a server with a 250MB mailbox quota, then you need at least 1TB of disk space. Moreover, there are additional components which must be factored into the equation. If a hard limit is not set on the mailbox quota, it is difficult to estimate the how much capacity you will need.
Database Whitespace
The database size on the physical disk isn't just the number of users multiplied by the user quota. When the majority of users are not near their mailbox quota, the databases will consume less space and whitespace isn't a capacity concern. The database itself will always have free pages, or whitespace, spread throughout. During online maintenance, items marked for removal from the database are removed, freeing up these pages. The percentage of whitespace is constantly changing with the highest percentage immediately after online maintenance, and the lowest percentage right before online maintenance.
The whitespace in the database can be approximated by the amount of mail sent and received by its users. For example, if you have 100 - 2GB mailboxes (200GB) in a database that send and receive an average of 10MB of mail per day, the whitespace would be approximately 1GB (100*10MB).
Whitespace can grow beyond any approximation if online maintenance is not able to complete a full pass. It is important that enough time is allocated for online maintenance to run each night, so that a full pass can complete within one week or less.
Database Dumpster
Each database has a dumpster that stores hard deleted items. By default, items are stored for 7 days in Exchange 2003, and 14 days in Exchange 2007. These include items that have been purged from the deleted items folder. Exchange 2007 will increase the overhead consumed by the database dumpster, because deleted items will now be stored for twice as long.
After the retention period has passed, these items will be removed from the database during an online maintenance cycle. Eventually, a steady state will be reached where your dumpster size will be equivalent to 2 weeks worth of incoming mail as a percentage of your database size. The exact percentage will depend on the amount of mail deleted and on individual mailbox sizes. The dumpster will add a percentage of overhead to the database dependent upon the mailbox size and the message delivery rate for that mailbox. For example, with a constant message delivery rate of 52MB a week, a 250MB mailbox would store approximately 104MB in the dumpster adding a 41% overhead. A 1GB mailbox storing the same 104MB in the dumpster would add a 10% overhead.
Database Size
Over time, user mailboxes will reach the mailbox quota, so an amount of mail equivalent to the incoming mail will need to be deleted in order to remain under the mailbox quota. This means that the dumpster will increase to a maximum size equivalent to two weeks' worth of incoming mail. If the majority of users have not reached the mailbox quota, only some of the incoming mail will be deleted, so the growth will be split between the dumpster and the increase in mailbox size. For example, if you take a 250MB very-heavy message profile mailbox that receives 52MB of mail per week (average message size 50KB), you'd have 104MB in the dumpster (41%), and 10MB in whitespace for a total mailbox size of 364MB. The other extreme could be a 2GB very-heavy message profile mailbox that received 52MB of mail per week, and then you'd have 104MB in the dumpster (5%), and 10MB in whitespace for a total mailbox size of 2.11GB. Fifty 2GB mailboxes in a storage group would be 105.6GB.
Here is a formula to show database size with the 2GB mailbox example:
MailboxSize=MailboxQuota+Whitespace+(WeeklyIncomingMail*2)
MailboxSize=2048MB+(10MB)+(52MB*2)
2162MB =2048MB+10MB+104MB or 6% larger than the quota
Recommended Maximum Database Size
Smaller databases are always better, but your sizing needs to be balanced with other factors, especially capacity and complexity. Larger databases will take longer to backup and restore, while immediately deploying with the maximum of 50 databases, will add complexity with more databases and LUNs to manage. With Exchange 2007 the maximum number of databases per server is increased from 20 to 50. On servers that don't use continuous replication, we recommend that you limit the database size to 100GB. On servers that use continuous replication, we recommend that you limit the database size to 200GB. For more information, see Planning Disk Storage.
SG/DB Count
To determine the maximum number of users per database, take the projected mailbox size and divide it by the maximum recommended database size. This will also help you determine how many databases you will need to handle the projected user count, assuming fully populated databases. Keep in mind, though, that due to non-transactional I/O or because of hardware limitations, you may eventually have to modify the number of users placed on a single server. Some administrators will prefer to use more databases to further shrink the database size. This can assist with backup and restore windows at the cost of more complexity in managing more databases per server. For more information on memory recommendations, see Planning processor and Memory Configurations.
Content Indexing
Content indexing creates an index, or catalog that allows end users to easily and quickly search through their mail items than manually trolling through the mailbox. Exchange 2003 created a content index that was ~35-45% the capacity of your database. In that version of Exchange, content indexing is a disk-intensive, scheduled crawl through the database. Exchange 2007 creates an index that is only about 5% of the total database size, which placed on the same LUN as the database it is indexing, on stand alone and CCR servers. An additional 5% capacity needs to be factored into the database LUN size for content indexing.
Database Growth Factor
For most deployments we recommend that you add an overhead factor (aka "fluff factor") of 20% to the database size (after all other factors have been considered) when creating the database LUN. This value will account for the other data blobs that reside in the database that are not necessarily seen when calculating mailbox sizes and whitespace; for example, deleted mailboxes within the retention policy and the data structure (e.g. tables, views, internal indices, etc.) within the database adds to the overall size of the database. Now that you have the actual database size and understand how content indexing adds to the capacity needs, you need to add an overhead factor to it when creating the actual database LUN.
Maintenance Capacity
A database that needs to be repaired or compacted offline will need capacity equal to the size of the target database plus 10%. Whether you allocate enough space for a single database, a storage group, or a backup set, this space needs to be available to perform these operations. This space can also be used when restoring a corrupted database. You can rename the corrupted database to prevent it from being overwritten during the restore in case your restore is bad and a repair is necessary.
Recovery Storage Group (RSG)
If you plan to use an RSG in your disaster recovery plans, enough capacity will need to be available to handle all the databases you wish to be able to simultaneously restore on that server.
Backup to Disk
Many administrators perform a streaming online backup to a disk target. If your backup and restore design involves backup to disk, enough capacity needs to be available on the server to house this data. Depending on the backup type, this can be as small as the database and logs, to as large of a backup set that you require. For example, some organizations have enough capacity on the backup LUN to handle 2 full backups plus all the incremental in between.
Log Capacity
The transaction log files are a record of every transaction performed by the ESE database engine. All transactions are written to the log first and then lazily written to the database. Unlike previous versions of Exchange, in Exchange 2007 the transaction logs have been reduced in size from 5MB to 1MB. The total capacity of the logs will not change, as there will be five times as many log files. This change was made to support the continuous replication features, and to minimize the amount of data loss if the primary storage fails.
The following table can be used to estimate the number of transaction logs that will be generated (per day) on an Exchange 2007 mailbox server when the average message size is 50KB:
| Mailbox Type | Message Profile | Logs Generated / Mailbox |
| Light | 5 sent/20 received | 7 |
| Average | 10 sent/40 received | 14 |
| Heavy | 20 sent/80 received | 28 |
| Very Heavy | 30 sent/120 received | 42 |
The following guidelines have been established for how message size affects log generation rate:
- If the average message size doubles to 100K, then the logs generated / mailbox increases by a factor of 1.9 (when compared with an average message size of 50KB). This number is the percentage of the database that is the attachments and message tables (message bodies and attachments).
- Thereafter, as message size doubles, the impact to the log generation rate per mailbox also doubles.
For example:
- If you have a message profile of Heavy and an average message size of 100KB, then the logs generated / mailbox would be 28 * 1.9 = 53.
- If you have a message profile of Heavy and an average message size of 200KB, then the logs generated / mailbox would be 28 * 3.8 = 109.
Backup and Restore Factors
Most enterprises that perform a nightly full or incremental backup will allocate the capacity of about 3 days worth of log files in a storage group on the transaction log LUN because the transaction logs will be truncated each night. If backup has a problem, you don't want to fill up the log drive, which would dismount the databases in the storage group(s). However, there are a few other considerations to the log LUN size. If the backup and restore design allows you to go back 2 weeks and roll forward all the logs since then, you will need two week's worth of log file space.
If the backup design includes weekly full and daily differential backups, then the log LUN would need to be larger than an entire week's worth of logs to allow both backup and replay during restore. If the backup design includes a weekly full and daily incremental backup, the log LUN would also need to be larger than all the logs in every incremental backup in your backup set to allow replay during restore.
Move Mailbox
Moving mailboxes is a primary capacity factor for large mailbox deployments. Most large companies move a percentage of their users on a nightly or weekly basis to different databases, servers, or sites. It may also be necessary to over provision the log LUN to accommodate user migration to Exchange 2007. While the source Exchange server will log the record deletions, which are small, it is the target server which must write everything transferred to the transaction logs first. If you generate 10GB of log files in one day, and keep a 3 day buffer of 30GB, then moving fifty 2GB mailboxes (100GB), would fill up your target log LUN and cause downtime. In cases such as these, you may have to allocate additional capacity for the log LUNs to accommodate your move mailbox practices.
Log Growth Factor
For most deployments we recommend that you add an overhead factor (aka "fluff factor") of 20% to the log size (after all other factors have been considered) when creating the log LUN to ensure necessary capacity exists in moments of unexpected log generation.
Example
Step 1: Database Size
Let's start with a 1GB mailbox with a goal of housing 4,000 very-heavy message profile mailboxes on a clustered mailbox server that is in a CCR environment. Assuming a 50KB, average message size, these mailboxes receive an average of 52MB of mail per week.
| Mailbox Size | Dumpster Size (2 weeks) | Whitespace | Total Size on Disk |
| 1GB | 104MB (2x52MB) | 10MB | 1.11GB (+11%) |
Each user will consume 1.11GB of disk space. With CCR the database size should be under 200GB, so we could only house 180 mailboxes per database at the maximum. With 4000 mailboxes, we would need 23 databases, each on their own storage group, to house them. 23 databases divided into 4000 mailboxes for a final mailbox per storage group number of 174.
| Mailboxes / DB | Total Databases | Database Size |
| 174 | 23 | 193GB |
Step 2: Transaction Log Size
The transaction log LUN should be large enough to accommodate all the logs you will generate during the backup set. Many organizations that utilize a daily full strategy plan for three times the daily log generation rate in the event that backup fails. When using a weekly full and then differential or incremental backup, at least a week's worth of log capacity is required to handle the restore case. Knowing that a very heavy message profile mailbox on average generates 42 transaction logs per day, a 4000 mailbox server will generate 168,000 transaction logs each day. This can be broken down by storage group to mean that each storage group will generate 7304 logs. 10% of the mailboxes are moved per week on one day (Saturday), and we perform a weekly full and daily incremental backup. In addition, the server can tolerate 3 days without log truncation.
| Logs per SG | Log File Size | Daily Log Size | Move Mailbox Size | Incremental Restore Size | Log LUN Size |
| 7304 | 1MB | 7.13GB | 17GB (17*1GB) | 21.4GB (3*7.13GB) | 46GB ((17.4+21.4)*1.2) |
The transaction log LUN needs to be large enough to handle both the logs generated by the move mailbox and have enough space to restore an entire week's worth of logs.
Transactional I/O
Transactional I/O is caused by end users performing actions on the Exchange server. Retrieving, receiving, sending, and deleting items causes disk I/O. The database I/O is 8KB in size and random, though it can be a multiple of 8KB when the I/O can be coalesced. Outlook users that are not using Cached Mode Outlook are directly affected by poor server disk latency and this is one of the most important concerns in storage design. To prevent a poor user experience, Exchange storage has specific latency requirements for database and transaction log LUNs. The transaction log LUN should be placed on the fastest storage with a goal of less than ten millisecond (<10ms) writes. The database LUN requires read and write response times of less than twenty milliseconds (<20ms).
Understanding IOPS
How do you determine your mailbox profile, or mailbox IOPS (Database I/O per mailbox, per second)? One of the key metrics when sizing storage in Exchange 2003 is the amount of database I/O per second each mailbox consumed. In Optimizing Storage for Exchange Server 2003, we show you how to measure your mailbox IOPS. Essentially, take the amount of I/O (both reads and writes) on the database LUN for a storage group, and divide that by the number of mailboxes in that storage group. 1000 mailboxes causing 1000 I/Os on the database LUN means you have an IOPS of 1.0 per mailbox.
Measure Baseline IOPS
Now that you know how IOPS are determined, measure your Exchange 2000/2003 baseline. Exchange 2007 will affect your baseline in a couple of ways. The number of mailboxes on the server will affect the overall database cache per mailbox. The amount of RAM influences how large your database cache can grow, and a larger database cache results in more cache read hits, thereby reducing your database read I/O. The key is that knowing your IOPS on a particular server is not enough to plan out an entire enterprise, since each server's RAM and number of mailboxes and storage groups will likely be different. Once you have your actual IOPS numbers, always apply a 20% I/O growth factor to your calculations to add some head room. You don't want a poor user experience because activity is a little heavier than normal, or because your RAID array just lost a disk.
Database cache
A 64-bit Windows Server running the 64-bit version of Exchange 2007 really opens up the amount of virtual address space. Exchange can now break through the 900MB database cache barrier to significantly reduce database read I/O and enable up to 50 databases per server. If Exchange can get a read hit in the database cache, it does not have to go to disk. This has the potential to reduce your database read I/Os significantly. The database read reduction is going to be dependent upon the amount of database cache that the Exchange server has available to it and the user message profile. Guidance on memory and storage groups can be found in Planning processor and Memory Configurations. Following this guidance will help you maximize the transactional I/O reduction over Exchange 2003. The amount of database cache per user is a key factor in the actual I/O reduction.
The following table demonstrates the increase is actual database cache per user when comparing the default 900MB in Exchange 2003, versus 5MB of database cache per user in Exchange 2007. It is this additional database cache that enables more read hits in cache, thus reducing database reads at the disk level.
| Mailbox Count | E2003 DB Cache/Mailbox (MB) | E2007 DB Cache/Mailbox (MB) | DB Cache Increase over E2003 |
| 4000 | 0.225 | 5 | 23x |
| 2000 | 0.45 | 5 | 11x |
| 1000 | 0.9 | 5 | 6x |
| 500 | 1.8 | 5 | 3x |
Predict Exchange 2007 Baseline IOPS
The two largest factors that can be used to predict Exchange 2007 database IOPS is the amount of database cache per user and the number of messages each user sends and receives per day. This is based on the standard knowledge worker, that uses Outlook 2007 in cached mode, and has been tested to be accurate within +/- 20%. Other client types and usage scenarios may yield inaccurate results. The predictions are only valid for user database cache sizes between 2-5MB. The formula has not been validated with users sending and receiving over 150 messages per day. The average message size for formula validation was 50KB, but the message size is not a primary factor for IOPS.
| User type (usage profile) | Send/receive per day approximately 50-kilobyte (KB) message size | DB cache per User | Estimated IOPS per user |
| Light | 5 sent/20 received | 2MB | 0.08 |
| Average | 10 sent/40 received | 3.5MB | 0.16 |
| Heavy | 20 sent/80 received | 5MB | 0.32 |
| Very heavy | 30 sent/120 received | 5MB | 0.48 |
To estimate database cache size subtract 2048MB (3072MB with LCR) from the total amount of memory installed in the Exchange Server, and divide that amount by the number of users. For example, an Exchange Server with 3000 users and 16GB of RAM would deduct 2GB for the system, leaving 14GB of RAM, or 4.77MB per user (14GB/3000=4.77MB). If the average per user database cache size is 4.77MB and the average number of messages sent and received per day is 60 we can estimate both database reads and writes.
Database Reads
First we take the 60 messages per day and multiply it by .0048 resulting in .288.
Next we take the amount of database cache per mailbox (4.77MB) to the -.65th power. (5^-.65) resulting in .3622. Finally we multiply the 2 results to estimate our database reads per user of .104 (.288*.3622=.104).
Database Writes
For writes take the number of messages per user of 60 and multiply it by .00152 resulting in .0912 database writes per user.
The total database IOPS per user would be the addition of both reads and writes at .195 IOPS.
The formula would be ((.0048*M)*(D^-.65))+(.00152*M), where M is the number of messages and D is the database cache, per user. ((.0048*60)*(4.77^-.65))+(.00152*60) = .195
The following graph demonstrates the databases read and write reduction achieved when running Exchange 2007 with 4000 - 250MB mailboxes simulating Outlook 2007 in cached mode and the recommended server memory:
Effect of Online Mode Clients
Unlike cached mode clients, all online mode client operations occur against the database. As a result, read I/O operations will increase against the database. Therefore, the following guidelines have been established if the majority of clients will operate in online mode:
-
250MB online mode mailbox clients will increase database read operations by a factor of 1.5 when compared with cached mode clients. Below 250MB, the impact is negligible.
-
As mailbox size doubles, the database read IOPS will also double (assuming equal item distribution between key folders remains the same).
Testing has also shown that increasing the database cache beyond 5MB/mailbox will not significantly reduce the database read I/O requirements. The following graph depicts 2GB mailboxes using online mode clients and the effect increasing the cache beyond 5MB has on reducing the database read I/O requirements.
As a result of this data, two recommendations can be made:
- Deploy cached mode clients where appropriate (see the "Mailbox Size (Item Count per Folder)" section for more information.
- Ensure that the I/O requirements are taken into consideration when designing the database storage.
For additional IOPS factors, such as 3rd party clients, see Optimizing Storage for Exchange Server 2003.
Database Read & Write ratios
In Exchange 2003, the database read to write ratio is typically 2:1 or 66% reads. With Exchange 2007, the larger database cache decreases the number of reads to the database on disk causing the reads to shrink as a percentage of total I/O. If you follow our recommended memory guidelines and use Outlook in cached mode, the read-to-write ratio should be closer to 1:1, which is 50% or less reads. When using Outlook in Online mode, or when using desktop search engines that do not utilize the Exchange 2007 Content Indexing Service, the read-to-write ratio will increase depending on the mailbox size (more read I/Os than write I/Os). Having more writes as a percentage of total I/O has particular implications when choosing a RAID type that has significant costs associated with writes, such as RAID5 or RAID6. Many third party applications and hand held devices perform many reads against the database impacting the database read to write ratio.
Log to DB ratio
In Exchange 2003, a transaction log LUN for a storage group requires roughly 10% as many I/Os as the databases in the storage group. For example if the database LUN is using 1000 I/Os, the log LUN would use approximately 100 I/Os. With the reduction in database reads in Exchange 2007, combined with the smaller log file size and the ability to have more storage groups, the log-to-database write ratio is roughly 1:2. For example, if the database LUN is consuming 500 write I/Os, you could expect your log LUN to consume approximately 250 write I/Os. After measuring or predicting the transactional log I/O, apply a 20% I/O overhead factor to ensure adequate headroom for busier than normal periods or hardware failure.
When using continuous replication, the primary transaction logs must be read and sent to the passive LUN. This overhead is an additional 10% in log reads. If the transaction log for a storage group, is consuming 250 write I/Os, you could expect an additional 25 read I/Os when using continuous replication.
For other factors that impact log I/O see Optimizing Storage for Exchange Server 2003.
Mailbox Size (Item Count per Folder)
In the Optimizing Storage for Exchange Server 2003 we explain how it is not the database size per se, but the number of items in your critical folders, as well as the client type that can cause a disk performance impact. This becomes more important as mailbox size increases.
Outlook 2007 in Cached Mode is important for reducing server I/O as much as 70% over Exchange 2003. The initial mailbox sync is an expensive operation, but over time, as the mailbox size grows, the disk subsystem burden is shifted from the Exchange server to the Outlook client. This means that having a large number of items in a user's Inbox, or an end-user searching a mailbox will have little effect on the server. This also means that Cached Mode users with large mailboxes may need faster computers than those with small mailboxes (depending on the individual user threshold for acceptable performance).
Outlook 2007 Cached Mode Recommendations (Client PC):