
Approaching a Data Science problem
Data science is fascinating to me for quite some time (The last two years or so!). Well, why will it not? Every other person belonging to the software industry is talking about its importance, its magnificence, and how it is helping to disrupt the current way of doing business. Now for a person who intends to understand the different terminologies and technologies related to this field, ranging from a plethora of new jobs to a large number of new tools, the first thing to do is to understand the process of Data science application creation.
Many articles will state a lot of steps but there 6 fundamental steps in any data science project.
Data Science Process
- Framing a business problem
- Collecting and extracting data from different sources
- Cleaning the Data or data preprocessing or data wrangling
- Data Exploration includes data visualization and descriptive statistics.
- Perform an in-depth analysis
- Communicating your results
This is the general process for any data science application. But there are other works as well in data domains. These include creating an pathway or architecture to save data generated by users or, to manage the storage and processing of large amounts of data. Based on the work done by an individual the various roles in data domain that one can take are :
- Data Scientist
- Data Analyst
- Business Analyst
- Machine learning engineer
- Big data engineer
- Data Architect
- Data manager
- Statisticians
Don’t worry much about these overwhelming names, what they do falls into some specific number of categories with a little differentiating role of each. The work done by these people generally overlaps when it comes to working in organizations. We will eventually understand who does what in the article.
1. The buisness problem
The most fundamental step is to identify and understand the business problem on which various data players will get their hands dirty. This particular step occurs in almost all of the departments inside a business, be it marketing, general management, customer relations, HR, operations management, or even the complete business objective of the company. The types of questions asked here determines which kind of problem is it and who will be responsible for solving this problem. Now here comes the distinction between data analysts and business analysts. Business analysts generally uses the past data to do descriptive analysis and some amount of predictive analytics to give insights about the processes running in the company to various stakeholders. On the other hand data analysts generally get their hands into predictive and prescriptive analytics.
2. Data Collection
The next step is to collect the right data to do the analysis. Right because the data collected can determine the fate of the analysis. There can be a number of different ways in collecting and organizing data. Companies can have their own databases managed and administered by data admins and data warehouse managers. Or they can scrape the internet to get the data by using publically available datasets, private datasets, or through API’s calling specific websites.
3. Data wrangling
Now having the place from where we are going to get the data does not ensure that the data is in the format as required by us or even the data obtained is clean. The obtained data needs to get preprocessed to generate a clean and clear data. This step is called data wrangling or data cleaning. One needs to identify errors in missing values, date time notations, misspelled words etc. This is what consumes about 90% of the time. Finally, after a lot of data wrangling, we’re done cleaning our dataset, and we’re ready to start drawing some insights from the data. Time for some exploratory data analysis!
4. Data Exploration
To gain the basic understanding about what our data is telling we do exploratory data analysis, that involves data visualization and descriptive statistics. It’s like taking a bird’s eye view of our data. So we do things like checking correlations in various columns, extract patterns visually, etc.
5. Data Analysis
This is where we do predictive analytics. In previous step we looked into gaining insight into our data and tried to answer simple business related questions. Now here comes the actual power of data analytics where we use complex machine learning algorithms to predict and forecast the future.
6. Result Communication
Once we have done all the previous steps, we might think that most work is done. But certainly this last step which seems so easy at first glance plays an extremely important role. Once we have our insights, it’s extremely important to convey results to the various stakeholders. Because, although the final decision to take lies in the hands of the manager, but how informative decision he takes is our responsibility.
Differentiating data roles
Now we have covered the difference between data analysts and business analyst. But to brief about it