What is a Database?
A database is a kind of a software that can store and process massive amounts of data. There are vendors of the database software. PostgreSQL, SQL Server, and MYSQL are the few recognized vendors among several in the market.
SQL is the programming language that lets not only to manipulate the data in the DB software, but also it let us to extract and query data. Each software has its own variant of SQL. We call it as SQL dialects.
So, the question is, are all SQL dialects compatible with each other? Learning SQL is like learning English. If you learn that language, you will be able to have conversation with everyone who speaks English, but you will not understand quite of number of idioms and regional accents. On the other hand, if you start your journey by learning idioms and regional English, then surely you will not be able to understand the most common version of English which is in use all over the world. Similarly, if you will learn the basic of SQL language, still you will have problems in understanding the specific functions and syntax used in different DBMS software.
The SQL Standard
SQL was used in the early 1980s. At that time, every database software had its own language which was fully incompatible with each other. It made difficult for the database administrators to switch over and utilize the power of different software to it is full potential. However, in the late eighties, people developed and started using an SQL standard, a version of SQL which will be understood by most DBMS software. SQL became a standard of American National Standards Institute (ANSI) in 1986 and of the International Organization for Standardization (ISO) in 1987.
The SQL standard has been revised multiple times since it is inception, although the core standards have been stable and intact since the early 1992. No database is fully compatible with the standard. Most SQL scripts require changes before being compatible with the standards of the SQL.
SQL in Big Data and NoSQL Technologies
Now, SQL is matured enough not to be used in databases these days. Other technologies routinely include SQL as the language to manipulate and process data.
As the amount of data in databases ten folds, the so-called NoSQL database started gaining popularity along with it. MongoDB, Cassandra, and Redis are among the most popular NoSQL tools in the market today. Interestingly, these tools have the capability of extracting data through the standard SQL language. This is the reason; people refer it to as ‘Not Only SQL’ rather than ‘NoSQL.’
Similarly, Big Data Technologies, namely Presto, Big Query, Hive, Spark, and others use SQL for processing gigabytes of data within minutes as well.
Different SQL dialects and it is examples
Terms that we know, like SELECT, WHERE, and ORDER BY, are the basics that the developers know, and they use it day in and day out. Let us take a moment to assess how the most popular database technologies manage these basics.
PL/Pg SQL is the SQL dialect that PostgreSQL database system uses. Unlike normal standard SQL, PL/PgSQL is Turing complete, i.e., it has all of the power of a full programming language in use today. A basic structure of a PL/Pg SQL is given below:
Most procedural code used in this language will be declared as functions and can be called from regular SQL standard queries which the developers know. Here is an example that fetches a user id from the username, taken from the documentation itself:
This function can be called from the procedure below:
TSQL or Transact SQL is an SQL dialect for the Microsoft SQL Server database. As the dialect is like the standard SQL, we will take an example for a normal DQL query.
Let us assume that we want to extract top ten sales amount from a table called sales, a normal SQL query would be like this:
The same query can be simplified by using TSQL syntax as shown below:
We might think that it is only a marginal improvement, but we can also include more complexity by selecting top 2% of the rows.
You can explore more about the different type of queries and other proprietary syntax offered by SQL Server in the documentation. Additionally, one should note that TSQL is also Turing Complete like PL/Pg SQL.
PL/SQL is Oracle’s proprietary database procedural language. It works similarly to other SQL procedural languages. A base syntax of this language is as follows:
Let us look at an example where this procedure extracts and calculates the number of users that have salaries higher than ten thousand.
We are declaring variables and then selecting the values of them from the user’s table. This can then al encapsulated into a function and then called in a manner like our PL/Pg SQL example.
Which SQL dialect should I learn?
The major differences and similarities between the sql dialects are given below
PostgreSQL is open source, while the other two are proprietary. This can be a deciding factor while choosing the database system. Majority of the open-source systems have an extensive documentation and a huge user base, while proprietary software has a better support, polished software, and excellent features.
PL/PgSQL tends to be polyglot in nature, i.e., its users live in different programming languages communities. On the other hand, TSQL’s users are primarily from the Microsoft’s ecosystem communities and similarly, PL/SQL has a community which is a user of Oracle’s ecosystem.
Majority of these languages have been around for a long time. You cannot really conclude which ecosystem is better among them all. Some features are better in one software, while there are some other features which are specialty of other software. So, it is better to first go through the set of features which are of priority before you take a plunge in learning a specific SQL dialect.
There are not really differences in performances in between the SQL dialects. It depends on how much your code is optimized and the amount of data with which you are playing. Most of these functions are compiled by native C code and it does not incur any additional cost.
List of all SQL dialects
The languages mentioned in this article includes only the most popular SQL dialects. There are far more which one can research, learn, and adopt according to the needs and requirements. Below I give more resources compiled by database ORMs like SQL Alchemy, and Microsoft Entity Framework.
Since ORMs only interact with the database systems and provide various wrappers on top of SQL, they need to understand these dialects from the basics provided.
- Here is a list of dialects that Hibernate ORM supports.
- Another list is here that supports SQL Alchemy.
- Finally, a list of SQL dialects that Microsoft’s own Entity Framework supports.
SQL in the Big Data World
When the NoSQL revolution began, everyone was predicting that it will be the death of SQL. However, it turned out that SQL came back with even more force in the Big Data World. Here are some of the examples of the SQL dialects being used in the Big Data:
- Apache Hive is one of the prominent of the SQL dialects that converts it into map-reduce.
- Spark SQL is for use in Spark Big Data processing.
- CQL, it is for working with Cassandra.
- KSQL, it is for stream processing.
SQL dialects are as versatile and the whole Data Science domain is. It fulfills the needs and requirements of every developer in its unique ways. Although learning any dialect takes a considerable amount of time, but we should keep it in mind that basic syntax of all SQL dialects is the same.
SQL is enormously powerful. Hopefully, this article ahs made you aware the power of this language, it is different dialects and what do they do. As database technologies are evolving and creating newer space of their own, it will get better in the future as well.