Education

Fundamentals of Data Science

From the moment I started typing this piece, in each one second, a person has produced 1.7 megabytes data. In one day, that exceeds 2.5 quintillion bytes. FYI, that is 2.5 followed by a staggering 18 zeros! This mind-boggling amount of data is increasing exponentially, because most of our actions can be translated into ones and naughts. We can use those raw data to analyze, predict patterns and trends, make a solution, or extract meaningful information by using different tools, such as machine learning algorithms. And that is, in short, data science.

Data science works to know the unknowns by creating automation systems. You might get confused by terms like big data, data mining, machine learning, deep learning with data science. These are all under the broad umbrella of data science.

Let’s look at the meaning of those terms:

Big data: All existing and increasing data are big data. The term big represents the vastness of data. The popular method to understand it is with three Vs. Volume, Velocity, and Variety.

Volume: Simply, it refers to the amount of data.
Velocity: The speed at which the data is gathered.
Variety: This tells us how diversified data is.

Big data works in real-time and has the power to mash numerous data to analyze the correlation among them. It can be structured, non-structured, non-quantifiable. One example of an application is the tech giant Facebook. It uses big data through photos, videos, comments, texts to generate and transmit data to make these accessible for everyone.

Data mining: It is a process we use to obtain patterns or trends from data. It requires databases that work as a warehouse for statistical measures and predictive analytics. Whenever we see YouTube or Netflix recommendations, we see the result of data mining. In this case, our patterns of liked or watched videos are measured through statistics.

Machine learning and deep learning: Machine learning is a remarkable branch of Artificial Intelligence. It mends the boundary of human being unable to process massive data. It is not like a computer-guided decision process as it uses past data. Machine learning perfects any predictive measures under data mining.

Furthermore, deep learning is an enhanced method of machine learning. It includes neural networks for problem-solving, using machine learning algorithms. For instance, automated cars, face recognition are the applications of machine learning.

In a nutshell, the huge data or big data is used for predicting or solving through the process of data mining. These two are used under machine learning to perfect the prediction or outcomes. Then deep learning uses these to take it to the next level. All of these and more are the chunks of data science.

Still not clear? Let’s have a visual.

Now that you get what data science is and what it comprises, let’s see how a company can earn more profit by analyzing customers’ behavior. The data science project follows the OSEMN (Obtain, Scrub, Explore, Model, Interpret) framework to get the desired outcome.

Obtain: The first step is to obtain data from available data sources or databases. Nowadays, we can store a customer’s residence, online purchase trends, and shopping history. For this, we can use Excel. In other cases, data scientists use tools like MySQL, Python, or R.
Scrub: In this part, the data gets filtered out. This is crucial as unnecessary data will lead to a meaningless result. For example, a customer’s race or color. Scrubbing data also requires replacing or extracting values. If the number of buying a particular product of one customer is missing, it will create problems in the analysis.
Explore: This part is a prerequisite of machine learning. Making sense of the collected data to question and define the problem is essential. Statistics derive the correlation between income and purchasing a product is the key action. If a customer is wealthy, he/she will buy more expensive products and brands. Age can define the difference in tastes as well.
Model: This is where the magic happens with machine learning. Here, data is reduced to the relevant ones. Then, with regression analysis, we forecast future trends. In the given example, with this process, we will understand the products that are more preferred by customers. We will also learn individual behaviors depending on the products, seasons, or other variables.
Interpret: This section does not require technical skills, rather the business knowledge or knowledge of the respective fields. The products high in demand should be in stock and vice versa, which will lead to more profit and reduce cost.

Data science has stretched its capabilities and usage from the field of technology to agriculture, sports, health, and many more. Unfortunately, not many of us have, even the basic knowledge of it. So, to compete with the data-driven future, we must start from scratch in this incredible field of data science.