Industrial Data Science: Before model development

By Shafeek Saleem

Published in

OCTAVE — John Keells Group

4 min readJan 10, 2024

Data science is an interdisciplinary field that involves using statistical and computational techniques to extract insights and knowledge from data. It combines elements of computer science, statistics, and domain expertise to analyze and interpret large datasets. Industrial data science is the application of data science techniques and tools to solve problems and drive innovation in industrial settings. It involves collecting, analyzing, and interpreting data to improve processes, increase revenue, optimize operations, and enhance decision-making. The potential of data-driven applications in industrial processes has encouraged the industry to invest in machine learning and data science teams for the past years.

When it comes to developing an end-to-end machine learning/ data-driven solution to solve a business problem, the process typically involves the following stages,

1. Business problem understanding

2. Define objectives

3. Value capture methodology

4. Data Collection

5. Data Preprocessing

6. Model Development (Model selection, training, evaluation, and optimization)

7. Pilot Testing

8. Production Deployment

9. Monitoring

Business problem understanding

The first step is to identify the business problem that the organization is trying to solve. It helps to ensure that the analytical solution is aligned with the business objectives and is designed to meet the specific needs of the organization. This is where the business domain knowledge plays a crucial part. Not all problems require machine learning; understanding the business problem helps to find out what kind of an approach would be required to solve it. The key-question here to answer is, “How can advanced analytics be leveraged to solve the business problem?”.

Define the objectives

problem has been identified, the next step is to define the objectives of the analytical solution. This may involve identifying the specific outcomes that the organization wants to achieve, such as increasing revenue, reducing customer churn rate, or improving customer satisfaction.

Objectives should be measurable and time bound. When objectives are measurable, it means that they can be quantified or evaluated based on specific metrics or criteria. This allows progress to be tracked and success to be measured.

Ex: Increase customer churn rate by 15% within the next six months.

Value capture

The primary objective of industrial data science projects is to create value for the business. While accuracy is important, it is not always necessary to achieve the desired business impact. It is important to focus on delivering actionable insights and solutions that address business problems and drive tangible results.

Industrial data science projects often operate within resource constraints such as time, budget, and data quality. Once the business problem and the objectives are defined, before going to the implementation of the solution, an estimated ROI (Return on Investment) should be assessed to see how much of an economic value it brings to the organization. In its simplest form, when investing in an advanced analytics solution, the benefits should outweigh the costs.

Data Collection

Data collection is a critical step in industrial data science, and it requires careful planning, monitoring, and documentation to ensure that the data is of high quality and can be used effectively for analysis and modeling. Here are some key considerations,

● Identify the right data sources that will be used for the analysis.

● Define the data requirements for the project, such as the data volume, data format, and data quality.

● Establish processes for collecting and storing data. This may involve setting up data pipelines or data warehouses, as well as defining data access and security protocols.

● Monitor data quality to ensure that the data collected is accurate and complete. This may involve setting up data validation checks or data cleaning processes to identify and correct errors or missing values.

● Document the data collection process.

Data Preprocessing

It helps to ensure that the data is in a suitable format for analysis and modeling. Data preprocessing involves,

● Data Cleaning: Clean the data by identifying and correcting errors, inconsistencies, and missing values.

● Data Integration: Integrate data from multiple sources to create a unified dataset.

● Data Transformation: Transform the data to make it suitable for analysis.

● Feature Engineering: Engineer new features from the existing data to improve the performance of machine learning algorithms.

Data preprocessing in industrial data science is a critical but a time-consuming process and often considered as a part of the Extract, Transform, Load (ETL) process in data warehousing and business intelligence. Thus, organizations often spend time and resources to create an automated ETL process. Data engineers play a key role by designing, building, and maintaining the data infrastructure necessary for data collection, storage, and processing.

Conclusion

By leveraging the power of data, companies can optimize their operations, improve their products and services, increase their revenue, and create new opportunities for growth and success. It is important to ensure that the analytics solution being developed, is effective and useful in practice, and that it continues to deliver value over time.