🧰 Technologies Used:
- Pandas (Data handling)
- Scikit-learn (Machine learning)
- Matplotlib (Visualization)
- Logistic Regression (Model)
📦 Step 1: Install Required Libraries
Make sure to install the required Python packages:
bash ------- pip install pandas scikit-learn matplotlib
📁 Step 2: Load and Explore the Dataset
We’ll use a simplified version of a customer churn dataset (you can replace this with any CSV dataset you have).
python ------- import pandas as pd # Sample dataset data = { 'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'], 'Age': [34, 40, 23, 52, 30], 'MonthlyCharges': [70.0, 89.5, 29.8, 56.2, 45.5], 'Tenure': [12, 45, 2, 22, 10], 'Churn': [0, 0, 1, 0, 1] # 1 = churned, 0 = stayed } df = pd.DataFrame(data) print(df.head())
🔍 Step 3: Preprocess the Data
Convert categorical data to numeric and split into features and target.
python ------- from sklearn.preprocessing import LabelEncoder # Encode categorical columns label_encoder = LabelEncoder() df['Gender'] = label_encoder.fit_transform(df['Gender']) # Male=1, Female=0 # Define features and target X = df[['Gender', 'Age', 'MonthlyCharges', 'Tenure']] y = df['Churn']
⚙️ Step 4: Train the Machine Learning Model
python ------- from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import accuracy_score, classification_report # Train-test split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) # Train the model model = LogisticRegression() model.fit(X_train, y_train) # Predict y_pred = model.predict(X_test) # Evaluate print("Accuracy:", accuracy_score(y_test, y_pred)) print(classification_report(y_test, y_pred))
📊 Step 5: Visualize the Predictions
python ------- import matplotlib.pyplot as plt # Bar chart for churn prediction distribution plt.bar(['Stayed', 'Churned'], [(y_pred == 0).sum(), (y_pred == 1).sum()]) plt.title('Customer Churn Prediction Results') plt.ylabel('Number of Customers') plt.show()
💡 Insights from AI Analytics
Even this simple project can give valuable insights such as:
- Younger customers with lower tenure may have a higher chance of churning.
- High monthly charges may correlate with churn.
- Female customers churned slightly more in the sample.
📁 Summary
This Python AI analytics project covers:
- Real-world problem: Churn prediction
- Data preprocessing (label encoding)
- Model training with Logistic Regression
- Evaluation and visualization
You can scale this by:
- Using a full telecom dataset (e.g., Kaggle’s “Telco Customer Churn”)
- Trying different models (Random Forest, XGBoost)
- Adding feature engineering (e.g., contract type, payment method)