The main advantage of the SQL language, often used in relational databases, over other types of information management (for example, with the help of classic Excel spreadsheets) is the support of very large volumes of databases at high speed of query processing.
Structured Query Language (SQL) is a specialized programming language designed for storing, processing and analyzing information in relational databases. SQL is used to create, manage and access data, allowing you to efficiently find, filter, combine and aggregate information. It serves as a key tool for accessing data, providing a convenient and formalized way to interact with information stores.
The evolution of SEQUEL-SQL systems goes through significant products and companies such as Oracle, IBM DB2, Microsoft SQL Server, SAP, PostgreSQL and MySQL, and culminates in the emergence of SQLiteand MariaDB(“SQL,” 2024).SQL provides spreadsheet capabilities not found in Excel, making data manipulation more scalable, secure, and easy to automate:
Creating and managing data structures (DDL): In SQL you can create, modify, and delete tables in a database, establish links between them, and define data storage structures. Excel, on the other hand, works with fixed sheets and cells, without clearly defined relationships between sheets and data sets.
Data manipulation (DML): SQL allows you to massively add, modify, delete and retrieve data at high speed, performing complex queries with filtering, sorting and table joins (Fig. 3.1-7). In Excel, processing large amounts of information requires manual actions or special macros, which slows down the process and increases the probability of errors.
Access control (DCL): SQL allows you to differentiate access rights to data for different users, limiting the ability to edit or view information. In Excel, on the other hand, access is either shared (when transferring a file) or requires complex settings with rights sharing via cloud services.

Excel makes it easy to work with data because of its visual and intuitive structure. However, as the amount of data increases, Excel’s performance decreases. Excel also faces limits on the amount of data it can store – a maximum of one million rows – and performance degrades long before this limit is reached. So while Excel looks preferable for visualization and manipulation of small amounts of data, SQL is better suited for handling large data sets.
The next stage in the development of structured data was the emergence of columnar databases (Columnar Databases), which are an alternative to traditional relational databases, especially when it comes to significantly larger data volumes and analytical calculations. Unlike row databases, where data is stored line by line, columnar databases record information by column. Compared to classical databases, this allows:
Reduce storage space by efficiently compressing uniform data in columns.
Speed up analytic queries, as only the required columns are read, not the entire table.
Optimize Big Data and data warehouses, e.g. Data Lakehouse Architecture.
We will talk more about columnar databases, Pandas DataFrame, Apache Parquet, HDF5, as well as about creating Big Data -stores based on them for the purposes of data analysis and processing in the following chapters of this book – “DataFrame: a universal tabular data format” and “Data storage formats and working with Apache Parquet: DWH -data warehouses and Data Lakehouse architecture “.