Microsoft SQL Server utilizes a client-server database structure, in which the process usually starts with a client application submitting a request to the server. The SQL Server then processes the request and delivers the desired outcome.
Below is an overview of the primary components of SQL database architecture, illustrated with a diagram. But before that, if you wish to learn SQL, you can enroll in some SQL Certification course offered by reputed Institutes. There are three key elements:
- SQL Server Network Interface (Protocol Layer)
- Relational Engine
- Storage Engine
Now, let’s delve into a thorough examination of each individual component.
SQL Server Network Interface (SNI) Layer: An Overview
Let’s begin by examining the layer that enables SQL Server to interact with external environments. This is known as the SQL Server Network Interface, which houses SQL Server protocols. The following protocols are primarily associated with SQL Server:
- Shared memory
- Named pipes
- TCP/IP
- TDS (Tabular Data Stream)
We can also mention a protocol called Virtual Interface Adapter (VIA). However, it necessitates additional hardware setup and has been deprecated by Microsoft, making it unavailable in the most recent SQL Server versions.
Now, let’s explore each of these protocols in depth.
Shared Memory
Shared memory, being the most straightforward protocol, is employed by default for local connections when both the client application and SQL Server reside on the same machine.
Named Pipes
Named pipes is a protocol suited for situations where the client application and SQL Server are linked through a local area network (LAN). This protocol is disabled by default, but it can be activated using the SQL Configuration Manager. The standard port for named pipes in SQL Server is 445.
TCP/IP
TCP/IP, the most crucial among these protocols, serves as the primary means of remotely connecting to SQL Server via an IP address and port number. This is necessary when the client application and SQL Server are installed on different machines. Although the default TCP port for SQL Server is 1433, it is advisable to modify the port after SQL Server installation.
Tabular Data Stream
Tabular Data Stream is an application-level protocol employed to transmit requests and responses between client applications and SQL Server systems. Clients typically maintain a long-lasting connection with the server. Once this connection is established, TDS messages facilitate communication between the client application and the database server. It is also feasible for a server to function as a client, necessitating a distinct TDS connection.
Database Engine
The database engine is the central component of SQL Server architecture, responsible for data storage and processing. It consists of two elements: the relational engine and the storage engine. The relational engine handles query processing, while the storage engine manages data storage and retrieval from storage systems. Let’s now delve into a detailed examination of SQL Server engine architecture for both engine types.
Relational Engine
The relational database engine in SQL Server, often referred to as the query processor, is responsible for executing queries, as well as managing memory, threads, tasks, and buffers. Essentially, it retrieves data from the storage engine and processes the results to return to the user.
The relational engine consists of three primary components: Query Parser, Optimizer, and Query Executor.
CMD Parser (Query Parser)
The CMD Parser (also known as the Query Parser) is the component that receives the query, evaluates it for syntactic and semantic errors, and ultimately produces a query tree.
The syntactic check assesses the user’s input query for correct syntax. If the query does not adhere to SQL syntax, the Parser returns an error message.
The semantic check, in its most basic form, verifies whether the queried table and column exist in the database schema. If the check is successful, the queried table is associated with the query. If the table/column does not exist, the Parser returns an error message. The complexity of the semantic check is dependent on the complexity of the query.
Lastly, the Parser generates an execution tree for the query.
Optimizer
The Optimizer’s goal is to generate the most cost-effective execution plan for the query, minimizing its run-time as much as possible. Not all queries require optimization; primarily, DML commands such as SELECT, INSERT, UPDATE, and DELETE are marked and sent to the Optimizer. Their cost calculations are based on CPU and memory usage, as well as input/output requirements.
The optimization process consists of three phases.
Phase 0: Trivial Plan (a.k.a. pre-optimization)
Sometimes, creating an optimized plan is unnecessary, particularly when the extra cost of searching for an optimized plan would not significantly reduce run-time. First, the Optimizer looks for a trivial plan. If none is available, it moves on to the first phase.
Phase 1: Transaction Processing Plan
During this step, the Optimizer searches for an optimized transaction processing plan. Initially, it looks for a simple plan that includes one index per table; if not found, the Optimizer seeks a more complex plan with multiple indexes per table.
Phase 2: Parallel Processing & Optimization
If the Optimizer cannot reduce the query run-time using the aforementioned methods, it turns to parallel processing, which relies on the user’s machine’s processing capabilities and configuration. If that is not possible, the final optimization begins, aiming to find any other viable options to execute the query in the most cost-effective manner.
Query Executor
Lastly, the Query Executor invokes the access method. After obtaining the necessary data from the storage engine, it moves on to the Server Network Interface within the protocol layer and is then transmitted to the user. That’s the entire process!
Storage Engine
The storage engine is responsible for managing the storage and retrieval of actual data upon user requests. It comprises a buffer manager and a transaction manager that interact with data and log files based on the query.
Data File Types
Before delving into the types, let’s briefly discuss data files in general. SQL Server stores its data (e.g., user and system tables, indexes) and SQL code (e.g., stored procedures, views, functions) in various types of data files. These files physically store data in pages, each of which is 8KB in size, representing the smallest data storage unit in SQL Server. A page comprises a page header containing metadata such as page type, page number, used and free space, pointers to previous and subsequent pages, and so on. The header size is 96 bytes.
Data pages are logically organized into extents, with each extent consisting of 8 pages.
There are three data files to be aware of.
Primary files
Every SQL Server database has one primary file, typically with an .mdf extension. It stores the main data associated with database tables, views, triggers, and other objects.
Secondary files
Secondary files are optional, and a database may have multiple or none at all. These files contain user-specific data and have an .ndf extension.
Log files
Log files, which have an .ldf extension, are utilized for managing and potentially rolling back unwanted (and uncommitted) transactions. They can also help address possible vulnerabilities and implement corresponding security measures as part of database hardening.
Access Method
Moving on to the SQL Server storage engine architecture, we’ll begin with the access method, which determines if the user’s query consists of a SELECT or non-SELECT statement. If it’s a SELECT statement, the query is sent to the Buffer Manager for processing. If it’s a non-SELECT statement, the query is directed to the Transaction Manager.
Buffer Manager
The SQL Server Buffer Manager is responsible for essential functions such as managing the plan cache, data parsing, and handling dirty pages.
Plan Cache
Initially, the Manager checks for an existing execution plan stored in the Plan Cache. If one is found, it is used along with the associated data.
It’s important to note that storing a query execution plan in the Plan Cache enables faster availability when your SQL Server receives the same query in the future. This is particularly beneficial for more complex queries.
Data Parsing
The Buffer Manager facilitates access to the requested data. If an execution plan is present in the Plan Cache, soft parsing occurs. The data is located in the Data Cache and used by the Query Executor. This approach offers better performance due to the reduced number of I/O operations.
If there is no execution plan in the Plan Cache, hard parsing takes place, and data must be retrieved from the data storage.
Transaction Manager
The Transaction Manager comes into play when the query consists of a non-SELECT statement. It manages the transaction using the Log Manager and the Lock Manager. The Log Manager maintains a record of all updates in the system through transaction logs. The Lock Manager, on the other hand, locks the data involved in each transaction, ensuring adherence to the ACID properties.
It is important to highlight write-ahead logging (WAL), which helps achieve atomicity and durability (two of the ACID properties) in database systems. The general principle of WAL is as follows: before any changes are written to the database, they are first logged in the transaction log, which is subsequently written to stable storage. Having a detailed log of the operations allows for the possibility of redoing those actions and rebuilding all the data from scratch, if necessary.
SQLOS
Now let’s discuss the role of SQLOS in the SQL Server architecture.
The SQL Server Operating System (SQLOS) is an application layer that underpins the SQL Server database engine layer. It manages a variety of crucial scheduling and resource management tasks:
- Scheduling of threads for CPU consumption and I/O completion
- Thread synchronization
- Memory allocation and consumption control (when Plan Cache, Lock Manager, or other components request memory from SQLOS)
- Tracking of extended events
- Exception handling (for both user- and system-generated exceptions)
- Deadlock detection mechanism
- Hosting of services and memory management for external components
In conclusion: Utilize the Best Tools for SQL Server Databases
If SQL Server Management Studio doesn’t meet all your daily needs, consider enhancing it with a collection of applications and add-ins called dbForge SQL Tools. These tools offer a wide range of capabilities, including:
- IntelliSense-like SQL completion, refactoring, and formatting
- Schema comparison and synchronization
- Table data comparison and deployment of changes
- Visual query building without coding
- Realistic test data generation
- Integrated version control
- Generation of comprehensive database documentation
- Versatile data export and import
- Database and table design
- Administration and security management
- Data analysis
Download SQL Tools for a free 30-day trial and experience all of their features firsthand!