- Use the smallest data type necessary to implement the required functionality. I have worked on several systems which used NUMERIC or FLOAT data types even though not a single row contained digits to the right of the decimal point. An easy way to optimize such systems is to simply change the data type from NUMERIC to INTEGER.
- Ensure that each column has a descriptive name; doing so makes database more self-documenting and easy to understand for those who didn’t develop it. Do not call columns “column 2,” “ID” or similar names. If you must abbreviate column names be sure to create a data dictionary denoting what data element each column is supposed to store.
- Many tables have “natural” keys; these are columns that have a business meaning, such as customer or account name. Although natural keys are immediately identifiable by business users, they aren’t always unique and they tend to change over time. Consider adding “surrogate” keys – columns that have no business meaning but can uniquely identify each row. Identity columns are a great example of surrogate keys.
While you could use a combination of natural keys to uniquely identify a record, joining tables on multiple columns will normally be slower than joining the same tables based on a single column with a small data type (such as INTEGER). If you do use surrogate keys be sure that their name includes the table / entity name. For example, do not add a column called “ID” to each table. Instead use “customer_id”, “supplier_id”, “store_id” and so forth.
- Each table allows up to 1,024 columns, but normally you don’t need nearly as many columns. For transactional systems ensure the data model is highly normalized; this means the same data element (customer address, customer phone number, product description, etc) should not be repeated in multiple tables. For reporting systems you can allow some redundancy, but only if thorough testing confirms that redundant columns improve query performance.
- If possible and appropriate, use fixed-length as opposed to variable-length data types. For example, if you know product code will always be limited to 50 characters use CHAR(50) as opposed to VARCHAR(50). Variable length columns impose an overhead that isn’t always necessary.
- Use UNICODE data types (NCHAR, NVARCHAR, NTEXT) only when necessary. If your database will only contain European characters, then you shouldn’t have to use UNICODE data types. Realize that UNICODE data types use twice-as-much storage as their non-UNICODE counterparts.
- Be sure to specify appropriate collation for your string columns. If you don’t specify the collation, SQL Server will use the collation defined at the database level. Collation determines the character set and sort order supported by the column. If the correct collation isn’t specified, you could see unexpected results when retrieving string data.
- Attempt to configure column null-ability correctly. If the column should always have a value, then configure it as NOT NULL. Using default constraints is more efficient than having columns allowing NULL values. A NULL value isn’t equal to anything else – empty string, zero or even other NULL values. A NULL denotes that the value is unknown. With a character column, you can often use a default value of “unknown” as opposed to allowing NULL values.
- Use large variable length data types sparingly. Variable length columns are normally stored on the same data pages as the rest of the record. However, if the combined size of the variable-length columns exceeds 8,000 characters, they’re stored on row-overflow data pages, which imposes additional overhead during data retrieval.
- Avoid using TEXT, NTEXT and IMAGE data types for any newly created table columns. These data types are deprecated and might not be available in future versions of SQL Server. Use VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX) data types instead.
- If you must implement multiple large columns, such as VARCHAR(MAX) or VARCHAR(4000) for example, consider splitting your table in two or more tables with fewer columns. This technique is sometimes called vertical partitioning. These tables will have one-to-one relationships among them and each will have a common primary key column.
- You cannot use columns with large variable length data types, such as VARBINARY(MAX), NVARCHAR(MAX) or VARCHAR(MAX) for clustered or non-clustered index keys. However, consider adding such columns as included columns to your non-clustered indexes. Doing so can help you have more “covered” queries, which can be resolved by seeking through the index as opposed to scanning the table. The included columns are not counted towards the 900 byte limit for index keys.They are also not counted towards the 16 column limit of index keys.
- The TIMESTAMP data type is a misnomer because it doesn’t track date or time values. Rather SQL Server uses a column with this data type to track the sequence of data modifications. Instead of checking each column within the table you can simply examine the values of the TIMESTAMP column to determine whether any column values have been modified.
- Consider using BIT data type columns for Boolean values, as opposed to storing “TRUE / FALSE”, “yes/no” or other character strings. SQL Server can store up to 8 columns with BIT data type in a single byte. One scenario where this could be handy is when you’re using “soft deletes.” Instead of physically removing a record from a table, you simply tag it as deleted, by flipping the bit value of the “deleted” column to 1.
Keep these pointers in mind when creating and maintaining SQL Server tables. Although creating tables might seem like a trivial exercise, database architects should carefully weigh the consequences of each option when building large scale systems.