Skills every aspiring data scientist should master and demonstrate — when presenting oneself for potential data science opportunities
Written by Kiruparan Balachandran, Manager Data Scientist at OCTAVE
Following are some very common questions I hear from aspiring data scientists:
- How can I become a data scientist?
- What are the skills required to become a data scientist?
- How can I present myself to a recruiter as a potential candidate?
Of course, answering the second and third questions will answer the first question.
Hence, this series of articles highlights the critical skills an aspiring data scientist should learn and the selected skills one should deep dive into, and also directs how aspirers can present themselves for potential data science opportunities.
Pure Mathematics
Most programming languages provide libraries where data scientists can build machine learning models with less effort. Still, it is highly advisable to firm your pure mathematics knowledge before stepping into data science. Aspirers should master the core areas of calculus (e.g., optimization algorithm for finding a local minimum is performed using calculus) and linear algebra (e.g., in a neural network, networks are represented and processed based on linear algebra) in pure mathematics.
Statistics
Like pure mathematics, statistics is another core skill every data scientist should master. Ideation in almost all the advanced analytics projects is driven using descriptive statistics (e.g., mean, median, mode, variance, and standard deviation). On the other hand, inferential statistics helps to generalise a larger population based on sample data.
Machine Learning
Expertise in the above two topics makes your life easier to explore this area. Task-driven (supervised learning), data-driven (unsupervised learning), and learning from errors (reinforcement learning) are the three pillars that drive machine learning paradigms. Deep dive on ensemble methods such as bagging and boosting will help you understand supervised learning in detail.
Programming Language
The previous three skills give a solid foundation on conceptual aspects of data science; thereafter, you must master the programming language to perform descriptive statistics, inferential statistics, and implement machine learning models. R is very popular among statisticians, and Python is the most preferred language among data science professionals.
Big Data Processing
Python and R help you to create proof-of-concept solutions with limited data, but in most of the advanced analytics projects, you will end up processing millions of data points. So, it is highly recommended that you learn more about scalable machine learning models (e.g., Spark MLlib), storage formats that support big data processing (e.g., parquet and delta), and platforms (e.g., Databricks) that enables big data processing.
Problem Solving
The skills we discussed so far are more technical and not sufficient to play the data scientist’s role. You will also engage with clients and delivery teams to create advanced analytics solutions as part of your day-to-day work. By pursuing this skill, you should be able to convert a generalized problem statement into a more specific problem statement and come up with actionable advanced analytics solutions.
Data Visualisation
With the above skills, you can perform exploratory data analysis and build model. However, a data scientist’s role is showing the model outcome to project stakeholders. You have to manage the complexity based on the audiences for dashboards. Further, you must be able to decide how best to visualise the model insights and recommendations in a manner that would be easily understood and useful to the end user.
Story Telling
Like problem-solving, this is another non-technical skill the aspiring data scientist should master. Frequently, this is misinterpreted and considered based on language fluency. However, this is about how simply you communicate the complex, advanced analytics solutions and outcomes to stakeholders. You may have to change the content based on the audience and communicate.
Becoming a data scientist is a journey, since many aspire to become a better data scientist, do not limit your learning to what is explained above. Practice this skill further on different forums to sharpen your knowledge.
The next article will select a few of the above topics and do a deep dive to give more color.