Worker Attrition Price Prediction Utilizing ZenML and Streamlit #Imaginations Hub

Worker Attrition Price Prediction Utilizing ZenML and Streamlit #Imaginations Hub
Image source -


Are u working as an HR ? struggling to foretell whether or not the workers in your staff will proceed working or they’re contemplate leaving the organisation, No worries ! you don’t wanna be a astrologer to foretell this, by utilizing the facility of Knowledge Science, we will predict it precisely. Allow us to start our great journey of worker Attrition price with a easy, but highly effective MLOps instrument, referred to as ZenML and streamlit. Let’s begin our journey.

Studying Targets

On this article, we’ll be taught,

  • What’s ZenML? Why and The right way to use it?
  • Why to make use of MLflow and tips on how to combine with ZenML?
  • The necessity of utilizing deployment pipeline
  • Implementation of worker attrition price mission and make predictions

This text was printed as part of the Knowledge Science Blogathon.

Challenge Implementation

Downside Assertion: Predict whether or not an worker will depart an organisation or not based mostly on a number of components like Age, earnings, Efficiency and so on.,

Resolution: Construct a Logistic Regression mannequin to foretell the attrition price of an Worker

Dataset: IBM HR Analytics Worker Attrition & Efficiency


Earlier than seeing the implementation of our mission, allow us to see why we’re utilizing ZenML right here first.

Why ZenML?

ZenML is a straightforward and highly effective MLOps orchestration instrument used to create ML pipelines, cache  pipeline steps and save computational assets. ZenML additionally gives integration with a number of ML instruments, making it top-of-the-line instrument to create ML pipelines.We will observe our mannequin steps,analysis metrics, we will see our pipelines visually in Dashboards and plenty of extra.

On this mission, we’ll implement a conventional pipeline, which makes use of ZenML, and we will probably be integrating mlflow with zenml, for experiment monitoring.We can even implement a steady deployment pipeline utilizing MLflow integration with ZenML, which can ingest and clear the information, prepare the mannequin and redeploys the mannequin, when the prediction meets some minimal analysis standards.With this pipeline, we will ensure that, if any new mannequin performs higher than the earlier mannequin’s threshold prediction worth,then the MLFlow deployment server will probably be up to date with the brand new mannequin as a substitute of the previous mannequin.

Employee Attrition Rate | ZenML | Streamlit

Frequent ZenML Phrases

  • Pipelines: Sequence of steps in our Challenge.
  • Elements: Constructing blocks or a selected operate in our MLOps pipeline.
  • Stacks: Assortment of elements in native/cloud.
  • Artifacts: Enter and output information of a step,in our mission, which is saved in Artifact retailer.
  • Artifact Retailer: Cupboard space for storing and model monitoring of our artifact.
  • Materializers: Elements which defines how artifacts are saved and retrieved from the artifact retailer.
  • Flavors: Options for particular instruments and use instances.
  • ZenML Server: Deployment for working stack elements remotely.
Employee Attrition Rate | ZenML | Streamlit

Pre-requisites and Primary ZenML Instructions

  • Activate your Digital Setting:
#create a digital surroundings
python3 -m venv venv
#Activate your digital environmnent in your mission folder
supply venv/bin/activate

All the fundamental ZenML Instructions with its functionalities are given under:

#Set up zenml
pip set up zenml

#to Launch zenml server and dashboard regionally
pip set up "zenml[server]"

#to see the zenml Model:
zenml model

#To provoke a brand new repository
zenml init

#to run the dashboard regionally:
zenml up

#to know the standing of our zenml Pipelines
zenml present

These instructions are essential to know to work with ZenML.

Integration of MLflow with ZenML

We’re utilizing mlflow because the experiment tracker, to trace our mannequin,artifacts, hyperparameter values. We’re registering the stack element, experiment tracker, model-deployer right here:

#Integrating mlflow with ZenML
zenml integration set up mlflow -y

#Register the experiment tracker
zenml experiment-tracker register mlflow_tracker_employee --flavor=mlflow

#Registering the mannequin deployer
zenml model-deployer register mlflow_employee --flavor=mlflow

#Registering the stack
zenml stack register mlflow_stack_employee -a default -o default -d mlflow_employee -e mlflow_tracker_employee --set

Zenml Stack Record

ZenML Stack List | Employee Attrition Rate | ZenML | Streamlit

Challenge Construction

employee-attrition-prediction/          # Challenge listing
├── information/                               
│   └── HR-Worker-Attrition.csv       # Dataset file
├── pipelines/                          
│   ├──          # Deployment pipeline
│   ├──            # Coaching pipeline
│   └──                        
├── src/                                # Supply code 
│   ├──                # Knowledge cleansing and preprocessing
│   ├──                   # Mannequin analysis
│   └──                    # Mannequin improvement
├── steps/                              # code recordsdata for ZenML steps
│   ├──                  # Ingestion of information
│   ├──                   # Knowledge cleansing and preprocessing
│   ├──                  # Prepare the mannequin    
│   ├──                   # Mannequin analysis
│   └──                       
├──                    # Streamlit net software
├──                   # Code for working deployment and prediction pipeline
├──                     # Code for working coaching pipeline
├── necessities.txt                    # Record of mission required packages
├──                           # Challenge documentation
└── .zen/                               # ZenML listing (created robotically after ZenML initialization)

Knowledge Ingestion

We first ingest the information from the HR-Worker-Attrition-Price dataset from the information folder.

import pandas as pd
from zenml import step

class IngestData:
    def get_data(self) -> pd.DataFrame:
        df = pd.read_csv("./information/HR-Worker-Attrition.csv")
        return df

def ingest_data() -> pd.DataFrame:
    ingest_data = IngestData()
    df = ingest_data.get_data()
    return df

@step is a decorator, used to make the operate ingest_data() as a step of the pipeline.

Exploratory Knowledge Evaluation

#Perceive the information
# See how the information appears to be like
# Verify the pattern information
#Verify the null values

#Verify the proportion of people that stayed and left the corporate:
df_left = df[df['Attrition'] == "Sure"]
df_stayed = df[df['Attrition'] == "No"]
print(f"The share of people that left the corporate are:left_percentage")
print(f"The share of people that stayed the corporate are:stayed_percentage")

#Analyse the variations in options between individuals who stayed and individuals who left the corporate


Output | Employee Attrition Rate | ZenML | Streamlit


  • The workers who left the job had been labored much less years within the firm.
  • The employess who left the corporate had been youthful, than the workers who stayed.
  • The workers who left are having the workplace far distance from dwelling than stayed.

Knowledge Cleansing and Processing

  • Knowledge Cleansing: We’ve eliminated the undesirable columns within the dataset corresponding to :”EmployeeCount”, “EmployeeNumber”, “StandardHours”, then we’ve modified the options which have solely information values between Sure(or)No to binary 1(or)0.
  • One scorching Encoding: Then, we did one-hot encoding to the specific columns corresponding to ‘BusinessTravel’, ‘Division’, ‘EducationField’, ‘Gender’, ‘JobRole’, ‘MaritalStatus’.
import pandas as pd
class DataPreProcessStrategy(DataStrategy):
    def __init__(self, encoder=None):
        self.encoder = encoder
    """This class is used to preprocess the given dataset"""
    def handle_data(self, information: pd.DataFrame) -> pd.DataFrame:
            print("Column Names Earlier than Preprocessing:", information.columns)  # Add this line
            information = information.drop(["EmployeeCount", "EmployeeNumber", "StandardHours"], axis=1)
            if 'Attrition' in information.columns:
                print("Attrition column present in information.")
                print("Attrition column not present in information.")
            information["Attrition"] = information["Attrition"].apply(lambda x: 1 if x == "Sure" else 0)
            information["Over18"] = information["Over18"].apply(lambda x: 1 if x == "Sure" else 0)
            information["OverTime"] = information["OverTime"].apply(lambda x: 1 if x == "Sure" else 0)

            # Extract categorical variables
            cat = information[['BusinessTravel', 'Department', 'EducationField', 'Gender', 'JobRole', 'MaritalStatus']]

            # Carry out one-hot encoding on categorical variables
            onehot = OneHotEncoder()
            cat_encoded = onehot.fit_transform(cat).toarray()
            # Convert cat_encoded to DataFrame
            cat_df = pd.DataFrame(cat_encoded)

            # Extract numerical variables
            numerical = information[['Age', 'Attrition', 'DailyRate', 'DistanceFromHome', 'Education', 'EnvironmentSatisfaction', 'HourlyRate', 'JobInvolvement', 'JobLevel', 'JobSatisfaction', 'MonthlyIncome', 'MonthlyRate', 'NumCompaniesWorked', 'Over18', 'OverTime', 'PercentSalaryHike', 'PerformanceRating', 'RelationshipSatisfaction', 'StockOptionLevel', 'TotalWorkingYears', 'TrainingTimesLastYear', 'WorkLifeBalance', 'YearsAtCompany', 'YearsInCurrentRole', 'YearsSinceLastPromotion', 'YearsWithCurrManager']]

            # Concatenate X_cat_df and X_numerical
            information = pd.concat([cat_df, numerical], axis=1)

            print("Column Names After Preprocessing:", information.columns)  # Add this line
            print("Preprocessed Knowledge:")
            return information
        besides Exception as e:
            logging.error(f"Error in preprocessing the information: e")
            increase e


The information appears to be like like this, in spite of everything information cleansing and processing accomplished: You’ll be able to see within the picture lastly, the information consists of solely numerical information after encoding accomplished.

Output | Employee Attrition Rate | ZenML | Streamlit

Splitting the Knowledge

We are going to then cut up the coaching and testing datasets within the ratio of 80:20.

from sklearn.model_selection import train_test_split
class DataDivideStrategy(DataStrategy):
    def handle_data(self, information: pd.DataFrame) -> Union[pd.DataFrame, pd.Series]:
            # Verify if 'Attrition' is current within the information
            if 'Attrition' in information.columns:
                X = information.drop(['Attrition'], axis=1)
                Y = information['Attrition']
                X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
                return X_train, X_test, Y_train, Y_test
                increase ValueError("'Attrition' column not present in information.")
        besides Exception as e:
            logging.error(f"Error in information dealing with: str(e)")
            increase e

Mannequin Coaching

Since, its’s a Classification downside, we’re utilizing Logistic Regression right here, we will additionally use Random forest Classifer, Gradient boosting and so on., classification algorithms.

from zenml import pipeline
def training_pipeline(data_path: str):
    df = ingest_data(data_path)
    X_train, X_test, y_train, y_test = clean_and_split_data(df)
    mannequin = define_model()  # Outline your machine studying mannequin
    trained_model = train_model(mannequin, X_train, y_train)
    evaluation_metrics = evaluate_model(trained_model, X_test, y_test)

Right here, @training_pipeline decorator is used to defind the operate training_pipeline() as a pipeline in ZenML.


For binary classification issues, we use analysis metrics corresponding to: accuracy, precision, F1 rating, ROC-AUC curve and so on., We import classification_report from scikit-learn library, to calculate the analysis metrics and to present us the classification report.

import logging
import numpy as np
from sklearn.metrics import classification_report

class ClassificationReport:
    def calculate_scores(y_true: np.ndarray, y_pred: np.ndarray):
  "Calculate Classification Report")
            report = classification_report(y_true, y_pred, output_dict=True)
  "Classification Report:nreport")
            return report
        besides Exception as e:
            logging.error(f"Error in calculating Classification Report: e")
            increase e

Classification Report:

Classification report | Employee Attrition Rate | ZenML | Streamlit

To see the dashboard of the training_pipeline, we have to run the,,

from zenml import pipeline
from pipelines.training_pipeline import train_pipeline
from zenml.shopper import Consumer
import pandas as pd

if __name__ == "__main__":
    uri = Consumer().active_stack.experiment_tracker.get_tracking_uri()

which can return the monitoring dashboard URL, appears to be like like this,
Dashboard URL:
You’ll be able to click on the URL and examine your superb coaching pipeline in zenml dashboard. Right here, the entire pipeline picture is cut up into totally different picture components to see,it extra clearly intimately.

Pipeline | Employee Attrition Rate | ZenML | Streamlit
Pipeline | Employee Attrition Rate | ZenML | Streamlit

General the training_pipeline appears to be like like this within the dashboard, given under:


Mannequin Deployment

Deployment Set off

class DeploymentTriggerConfig(BaseParameters):
    min_accuracy: float = 0.5

On this class DeploymentTriggerConfig, we set a minimal accuracy parameter, which specifies what our minimal mannequin accuracy needs to be.

Organising Deployment Set off

def deployment_trigger(
    accuracy: float,
    config: DeploymentTriggerConfig,
    return accuracy > config.min_accuracy

Right here, this deployment_trigger() operate is used to deploy  the mannequin, solely when it exceeds the minimal accuracy. We are going to cowl about why we’ve used caching right here within the subsequent part.

Steady Deployment Pipeline

@pipeline(enable_cache=False, settings="docker":docker_settings)
def continuous_deployment_pipeline(
   data_path: str,
   #data_path="C:/Customers/person/Desktop/machine studying/Challenge/zenml Pipeline/Customer_Satisfaction_project/information/olist_customers_dataset.csv",
   staff: int=1,
   # Clear the information and cut up into coaching/take a look at units

Right here, on this continuous_deployment_pipeline(), we’ll ingest the information, clear the information, prepare our mannequin, consider it, and deploy our mannequin provided that it passes the deployment_trigger() situation, in order that we will ensure that the brand new mannequin we’re going to deploy, will execute provided that it’s prediction accuracy exceeds the earlier mannequin’s prediction accuracy,which is the edge worth. That is how the continous_deployment_pipeline() works.

 Caching refers to storing the output of the earlier executed steps within the pipeline. The outputs are saved within the Artifact retailer. We use  caching within the pipeline parameter, to say that, there is no such thing as a change within the outputs within the earlier runs and present working step, so zenML will reuse the earlier run output itself. Enabling caching will velocity up the pipeline working course of and saves our computational assets. However generally, in conditions  the place, we have to run pipelines, the place there will probably be dynamic change within the enter, parameters, output like our continuous_deployment_pipeline(), then turning off the caching is effectively and good. So, we’ve written  enable_cache=False right here.

Inference Pipeline

We use inference pipeline to make predictions on the brand new information, based mostly on the deployed mannequin. Let’s see how we used this pipeline in our mission.


def inference_pipeline(pipeline_name: str, pipeline_step_name:str):
   #print("Knowledge Form for Inference:", information.form)  # Print the form of information for inference
   return prediction

Right here, the inference_pipeline(), works within the following order:

  • dynamic_importer()– First, the dynamic_importer() hundreds the brand new information and prepares it.
  • prediction_service_loader()– The prediction_service_loader() hundreds the deployed mannequin, based mostly on the pipeline identify and step identify parameters.
  • predictor()-Then, predictor() is used to foretell the brand new information based mostly on the deployed mannequin.

Allow us to see about every of those features  under:

dynamic importer()

def dynamic_importer()->str:
   return information  

Right here, it calls the get_data_for_test() within the, which can hundreds the brand new information, do information processing and returns the information.


def prediction_service_loader(
   pipeline_name: str,
   pipeline_step_name: str,
   model_name: str="mannequin", 
   if not existing_services:
      increase RuntimeError(
         f"No MLFlow deployment service discovered for pipeline pipeline_name,step pipeline_step_name and modelmodel_name and pipeline for the mannequin model_name is at present working"

Right here, on this prediction_service_loader (), we  load the deployment service with respect to the deployed mannequin based mostly on the parameters. A deployment service is a runtime surroundings, the place our deployed mannequin, is able to settle for inference requests to make predictions on the brand new information. The road existing_services=mlflow_model_deployer_component.find_model_server(), searches for any current deployment service obtainable based mostly on the given parameters like pipeline identify and pipeline step identify, if there is no such thing as a current companies obtainable, then it means the deployment pipeline isn’t executed but, or there is a matter with the deployment pipeline, so it thows an Runtime Error.


def predictor(
    service: MLFlowDeploymentService,
    information: str,
) -> np.ndarray:
    """Run an inference request in opposition to a prediction service"""

    service.begin(timeout=21)  # needs to be a NOP if already began
    information = json.hundreds(information)
    columns_for_df = [
    df = pd.DataFrame(information["data"], columns=columns_for_df)
    json_list = json.hundreds(json.dumps(checklist(df.T.to_dict().values())))
    information = np.array(json_list)
    prediction = service.predict(information)
    return prediction

After, having the deployed mannequin and the brand new information, we will use the predictor(), to make the predictions.

To visually, see the continual deployment and inference pipeline, we have to run the, the place the configurations, to deploy and predict will probably be outlined.

@click on.possibility(
    sort=click on.Alternative([DEPLOY, PREDICT, DEPLOY_AND_PREDICT]),
    assist="Optionally you possibly can select to solely run the deployment "
    "pipeline to coach and deploy a mannequin (`deploy`), or to "
    "solely run a prediction in opposition to the deployed mannequin "
    "(`predict`). By default each will probably be run "

Right here, we will both run the continual deployment pipeline or the inference pipeline, by following these instructions,

#The continual deployment pipeline

#To see the inference Pipeline(that's to deploy and predict)
python --config predict

After executing, the instructions, you possibly can see the zenML dashboard URL,like this

Dashboard URL:

Take pleasure in your pipeline visualisations within the dashboard:

Steady deployment Pipeline


The continual deployment pipeline,(from ingestion of information to mlflow_model_deployer_step appears to be like like),


Inference Pipeline


Constructing A Streamlit Software


Streamlit is an incredible open-source, python based mostly framework, used to create  UI’s, we will use streamlit to construct net apps rapidly, with out figuring out  backend or frontend improvement. First, we have to set up streamlit in our PC.
The instructions to put in and run streamlit server in our native system are,

#set up streamlit in our native PC
pip set up streamlit

#to run the streamlit native net server
streamlit run


import json
import numpy as np
import pandas as pd
import streamlit as st
from PIL import Picture
from pipelines.deployment_pipeline import prediction_service_loader
from run_deployment import primary

# Outline a worldwide variable to maintain observe of the service standing
service_started = False

def start_service():
    world service_started
    service = prediction_service_loader(
    service.begin(timeout=21)  # Begin the service
    service_started = True
    return service

def stop_service(service):
    world service_started
    service.cease()  # Cease the service
    service_started = False

def primary():
    st.title("Worker Attrition Prediction")

    age = st.sidebar.slider("Age", 18, 65, 30)
    monthly_income = st.sidebar.slider("Month-to-month Revenue", 0, 20000, 5000)
    total_working_years = st.sidebar.slider("Whole Working Years", 0, 40, 10)
    years_in_current_role = st.sidebar.slider("Years in Present Function", 0, 20, 5)
    years_since_last_promotion = st.sidebar.slider("Years Since Final Promotion", 0, 15, 2)

    if st.button("Predict"):
        world service_started
        if not service_started:
            service = start_service()

        input_data = 
            "Age": [age],
            "MonthlyIncome": [monthly_income],
            "TotalWorkingYears": [total_working_years],
            "YearsInCurrentRole": [years_in_current_role],
            "YearsSinceLastPromotion": [years_since_last_promotion],

        df = pd.DataFrame(input_data)
        json_list = json.hundreds(json.dumps(checklist(df.T.to_dict().values())))
        information = np.array(json_list)
        pred = service.predict(information)
            "Predicted Worker Attrition Chance (0 - 1): :.2f".format(

        # Cease the service after prediction
        if service_started:

if __name__ == "__main__":

Right here, we’ve created a streamlit net app, named “Worker Attrition Prediction“, wherein customers can present the inputs corresponding to Age, month-to-month earnings and so on., to make the prediction, when the person clicks the “Predict” button, the enter information is shipped to the deployed mannequin, the prediction is made and displayed for the person. This, is how our streamlit_app works. When, we run the file, we’ll get the community URL like this,


By clicking the community URL, we will see the superb Streamlit UI, used to make predictions.


You’ll be able to view all of your stacks, elements used, variety of pipelines ran within the ZenML Dashboard making your MLOps journey simple.

ZenML Dashboard:



Employee Attrition Rate | ZenML | Streamlit


Stack components | Employee Attrition Rate | ZenML | Streamlit

Variety of Pipelines:

Pipeline | Employee Attrition Rate | ZenML | Streamlit

Variety of runs:

Pipeline | Employee Attrition Rate | ZenML | Streamlit


We’ve efficiently constructed an Finish-to-Finish Worker Attrition Price prediciton MLOps mission. We’ve ingested the information, cleaned it, skilled the mannequin, consider the mannequin, set off the deployment, deploy the mannequin, predict the mannequin by getting the brand new information, seek for current mannequin companies, if current, then predict the information, get the person inputs from the Streamlit net app and make predictions, whereas will assist the HR division to take information pushed selections.

GitHub Code:


  • ZenML acts as an highly effective orchestration instrument, with integration of different ML instruments.
  • The Steady deployment pipeline makes positive, solely the perfect fashions are deployed, helps in predicting with excessive accuracy.
  • Caching helps us in saving the assets and logging helps us observe the pipeline, helps us in debugging and error monitoring.
  • Dashboards assist us to have a transparent view on ML pipeline workflow.

Ceaselessly Requested Questions

Q1. Is ZenML free to make use of?

A. Sure, ZenML is a free open-source MLOps instrument, however to make use of the ZenML cloud, to make use of the zenml cloud servers with extra assist from their staff, it prices moreover.

Q2. What makes Streamlit higher than FastAPI, Flask, Shiny to create person interfaces?

A. Not like Streamlit, to make use of FastAPI/ Flask / Shiny, it requires robust information in HTML/CSS to create interactive UI’s. Whereas, in Streamlit, we don’t want front-end information to make use of it.

Q3.  What’s the want of integrating MLflow with ZenML?

A. Whereas ZenML offers a framework to handle and orchestrate ML pipelines, by integrating with mlflow we will observe our ML experiments, it’s artefacts, parameters, and log metrics. So, we will get extra data in regards to the execution of steps.

This fall. What measures ought to the corporate take after predicting the worker will depart/not?

A. The corporate ought to make retention methods to stop expert workers who’re at excessive threat of leaving, by making wage changes, creating partaking applications for them, coaching applications for his or her profession and private progress, and guaranteeing a very good work surroundings which improves each worker’s profession progress and firm’s progress.

The media proven on this article isn’t owned by Analytics Vidhya and is used on the Creator’s discretion.

Related articles

You may also be interested in