Back to blog

Why is Data Engineering popular now?

Data Engineering

Why is Data Engineering Popular Now?

The deep dive into the trend of Data Engineering and why it’s essential.

Photo by NASA on Unsplash

Data Engineering is the ‘real’ sexiest job in the 21st century. If you follow the data industry and are enthusiastic about the field, you may have seen or heard the term Data Engineering or the sexy title “Data Engineer” popping up everywhere.

If you’re curious about why and would like to dig deep into this topic, you’re in the right place!

My knowledge is limited, so I decided to ask this exact question on the r/dataengineering subreddit to understand this phenomenon.

With over 20 informative and helpful responses about this, below will be my attempt to summarise the main ideas and provide you a general idea of exactly why Data Engineering is so popular now.

A simple table of contents for what would be covered:

- Trend of Data Engineering visualized
- The DNA of Data Engineering
- Why the name Data Engineer
- Factors of its popularity
- Importance of DE
- Resources and Deep Dive

The trend of Data Engineering visualized

First, to analyze this trend, let’s do some “investigating” on the term “Data Engineering” and “Data Engineer” using Google Trends and Google Ngram Viewer

Google Trends

As a profession — {profession: “Data Engineer”}, we get this trend:

We can observe that the term started exploding around 2016 and has been on the rise ever since.

As a search term — {search_term: [“Data Engineer”, "Data Engineering"} we get a trend like this:

Interestingly, there was a spike in data engineering back in 2004, which died down and stayed stagnant, and then started rising now around 2016.

Google Ngrams

A search on Google Ngrams from the year 1990 to 2019:

Photo by NASA on Unsplash

We can see that in the 90s, we had a slight spike and a huge inflection point in 2010, which was around the time the Data Science hype started.

The DNA of Data Engineering

Now that the trend is visualized, you might be asking, where did this huge influx come from? Jobs don’t just appear out of thin air.

The truth is data engineering has already been around for a decade or two, at least the similar aspects of it.

It all started with IBM’s database management systems in the 70s and concepts of databases, ETL, which then evolved to “information engineering” that describes database design in the 80s. Then the internet boom started in the mid-90s to 00s, which gave rise to “big data.”

All these created roles that work with data, such as ETL Developers, Database Developers, Database Administrators (DBAs), Big Data Developer, BI developer, etc., which had tasks similar to data engineers we see today.

Now you know a bit of the history of Data Engineer; before we move on to the factors, let’s ask why the name.

Why the name Data Engineer?

I couldn’t pinpoint who exactly coined the term “Data Engineer,” but as to why it was called that way, there are two possible reasons for that.

The technical reason: A big part of the name is due to the combination of traditional data roles like DBAs or ETL developers and software engineering (because of the adoption of Python and Java/Scala). Thus, the industry eventually agreed on the name “Data Engineer” that encompasses both the data and engineering aspect of the role.

The marketing reason: The term “machine learning” was once “cognitive computing,” but IBM changed it to a sexier title to attract clients and employees. This happened to the title “data scientists” as well, which was once “programming statistician.” So instead of terms like “data warehouse developer,” “cloud engineer,” “big data developer,” you sell the title “data engineer,” a role that sounds sexy and attractive.

Factors causing this trend

I believe there are four main factors: a series of events that gave rise to the data engineering trend.

  1. Explosion of data
  2. The data science hype
  3. Data companies and Cloud services
  4. The need for quality data

Explosion of data

The massive explosion of data from the internet, social media applications, and every product that generates data gave rise to the idea of using data to generate insights and make decisions from it.

The data science hype

Companies worldwide realized the power of data, about how data is the oil of the 21st century, the data science hype began, engendering an influx of data scientists — the sexiest job in the 21st century.

They feared missing out and started to hire data scientists to build exciting new data products, garner insights from data, build incredible models to make predictions, etc.

Data Companies and Cloud services

At the same time, the explosion of data and the data science hype also resulted in a wave of cloud services and data companies that helps companies do things like storing data, building data pipelines, and many more.

For example, data companies like Airbyte, Fivetran, DataBricks, and Snowflake and cloud services like AWS and Google Cloud Platform allow the business to work with their data well.

The need for quality data

As years pass by, ever since the hype started, data scientists, whose main job was to perform data analysis and model development, realized they had to deal with messy data and move it around.

And to do that, they were spending time doing data collection, cleanup, storing, etc., which involved writing data pipelines, which evolved to a lot of CI/CD and DevOps tasks due to technologies like the cloud.

Companies then realized that their data scientists are only effective if they have quality data to work with. Thus the need for data engineers skyrocketed. Their primary responsibility was to prepare quality data for data scientists.

Now data scientists don’t have to spend 80% of their time cleaning data, but 100% of their time what they were paid to do.

Data Engineers and Data Scientists

There is two good answer on my Reddit post that I’ll end this article with that provides an analogy for how Data Engineers are necessary for Data scientists to work efficiently.

Data engineering is a force multiplier where the work of loading and prepping data can be used by many in an organization and that work can scale. It enables downstream knowledge workers like analysts/scientists and BI developers to do their jobs more effectively.

Think of it as the person/team that brings all the food to a warehouse and organizes it for purchase. Instead of having to go to many small markets a person can go to one place to get everything they need and save time and have better quality. Data engineers should do this kind of work.

Having a first hire be a data scientist is like hiring a chef but not having an efficient way to get ingredients. They will spend more of their time gathering ingredients. If you hire a distributor to bring you the food that solves the travel problem but chef still needs to prep the food. An analytics engineer (or SQL developer or BI developer if you like) can assist in prep of the food like a sous chef. At that point the chef (scientist) could hopefully spend most of their time creating meals instead of prepping. In the end it is a supply chain problem. People are realizing they have a problem with ingredients before they have a problem with cooking.

— by flerkentariner

Data Science and Data Engineering are siblings, but very distinct. Being great at one doesn’t make you qualified to do the other… I used to explain it at my last job as “the scientists are the ones who pick the drop site, the engineers are the ones who jump out of the airplane.” We need each other.

— by el_tacomonkey

More about this topic

If you found this topic interesting and would like to dive deeper into it, check out these three articles. I’ve also linked some resources on Data Engineering for those who are interested.

Deep dive

Resources for Data Engineering

Thanks for reading!


Follow Bitgrit’s socials 📱 to stay updated on talks and upcoming competitions!