Understanding Data Modeling: The Foundation of Everything in the World of Data

MODELINGEN-US

Lucas Lumertz

12/6/20244 min read

a sound waveform of a sound wave
a sound waveform of a sound wave

Hey everyone! Is everything cool? I truly hope so. Today we're going to talk about a super relevant topic in the data world, even for those who still only use spreadsheets: Data Modeling.

When we talk about data, one of the first things we need to learn and remember is how to organize all that available information. And data modeling is the foundation for that. In this article, I'll explain what data modeling is, the different types that exist, and how to choose the most appropriate one for each situation. Because, as we talked about before, everything depends. Shall we explore this together in a simple and practical way? Let's go!

Let's start at the beginning. In the early days of computing, data was stored in an unorganized way, which made accessing and using it efficiently quite difficult. That's when the need to create a system to organize everything arose. Data modeling, like everything else in the technology field, evolved along with the advancement of the area and market needs. With the increase in information and the complexity of systems, new forms of modeling were created to keep pace.

For example, in the 1970s, the relational model gained a lot of traction, allowing data to be organized into simple tables. Later, with increased demand and the need for data analysis, models like the star schema and snowflake emerged, focused on storing and querying large volumes of data.

So, what exactly is data modeling, Lucão? Come with me—remember our hypothetical city from previous articles? We'll use it again here. Data modeling is like drawing the map of a city before you start building the streets and buildings. It is the process of organizing and structuring information so we can store and access it more efficiently. This helps answer important questions like: "What data do we need to save?" and "How is this data related?"

Example: Imagine a school. The data includes students, teachers, and subjects. The modeling will organize this data, connecting who the students are, who the teachers are, and what subjects they teach, all across different tables.

Now that we understand the main concept and a little bit of the evolution behind data modeling, let's talk about the main types of modeling that currently exist in the market:

1. Relational Model:

This model organizes data into tables with rows and columns. It's like a giant spreadsheet where each row is a record (e.g., a customer) and each column is a piece of information about them (e.g., name, age, email). And when more than one table is needed, they usually have relationships with each other.

Example: Think of a phone's contact table. Each row is a person, and the columns are information like phone number, address, and email.

2. Star Schema:

The star schema is widely used for data analysis. It organizes information into a central table (the fact table) connected to other smaller tables (the dimensional tables) that help categorize the data.

The fact table typically stores the measures associated with observations or certain events. For example, it might store sales orders, balances, exchange rates, and more.

The dimension tables (usually one or more) help the fact table with complementary information.

The fact table holds the company's events; it contains information that may or may not be repeated, even more than once. Each row in the table represents one fact.

The dimension table has information in unique rows that will not be repeated, which allows us to characterize, complement, explain, and better understand what we have in the fact table.

Example: Imagine you own a store. The central table stores information about sales, while the surrounding tables contain details about products, customers, and salespeople.

3. Snowflake Schema:

The snowflake schema is a more detailed version of the star schema. In it, the dimensional tables are divided into subcategories to organize the data even better.

Example: In the case of the store, in addition to having a customer table, you could divide it into subcategories, such as "address" and "purchase preferences," and so on.

Okay, I'm starting to understand, Lucas, but what is each type of modeling used for? That's what we're going to look at now:

  • Relational: Ideal for operational systems that need to handle day-to-day data, such as business management or customer service systems.

  • Star Schema: Focused on data analysis, it is used in business intelligence tools to generate fast reports and analyses.

  • Snowflake Schema: Also geared towards analysis, but for more complex and detailed scenarios, with a lot of interconnected data.

Below, I've created a table to more simply illustrate the differences between the different models:

Ah, Lucas, but which one is the best? As I said right at the beginning, it depends on the situation and the problem we're trying to solve, but I will highlight a few points to help you with the decision.

  • Relational Model: When the focus is on storing and accessing operational data quickly, such as in banking systems or ERPs, the relational model is often the best option.

  • Star Schema: For companies that need quick reports and analysis without a lot of complexity, the star schema will serve you very well and bring a lot of agility.

  • Snowflake Schema: When the data is extremely complex and detailed, and the organization requires highly specific queries, that's when we use the Snowflake schema.

Well, I think after this quick article, it is quite evident that data modeling is essential for any system that deals with information. It will help to organize, access, and use data in a much more efficient way. Choosing the right model will always depend on the objective and the complexity of the project.

If you want something simple and direct, the relational model is a great choice. For fast analysis, the star schema is ideal. But if you need something more detailed, the snowflake schema is the way to go.

Ultimately, data modeling is like drawing the blueprint of a house before building it. The better planned it is, the more efficient the final result will be. What about you? Have you thought about how you're organizing your data?

Anyway, that's it for today, everyone. All the best, and until the next topic. 😊