🧠 AI Analytics Project: Customer Churn Prediction

🧰 Technologies Used:

Pandas (Data handling)
Scikit-learn (Machine learning)
Matplotlib (Visualization)
Logistic Regression (Model)

📦 Step 1: Install Required Libraries

Make sure to install the required Python packages:

bash
-------
pip install pandas scikit-learn matplotlib

📁 Step 2: Load and Explore the Dataset

We’ll use a simplified version of a customer churn dataset (you can replace this with any CSV dataset you have).

python
-------
import pandas as pd

# Sample dataset
data = {
    'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
    'Age': [34, 40, 23, 52, 30],
    'MonthlyCharges': [70.0, 89.5, 29.8, 56.2, 45.5],
    'Tenure': [12, 45, 2, 22, 10],
    'Churn': [0, 0, 1, 0, 1]  # 1 = churned, 0 = stayed
}

df = pd.DataFrame(data)
print(df.head())

🔍 Step 3: Preprocess the Data

Convert categorical data to numeric and split into features and target.

python
-------
from sklearn.preprocessing import LabelEncoder

# Encode categorical columns
label_encoder = LabelEncoder()
df['Gender'] = label_encoder.fit_transform(df['Gender'])  # Male=1, Female=0

# Define features and target
X = df[['Gender', 'Age', 'MonthlyCharges', 'Tenure']]
y = df['Churn']

⚙️ Step 4: Train the Machine Learning Model

python
-------
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)

# Predict
y_pred = model.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

📊 Step 5: Visualize the Predictions

python
-------
import matplotlib.pyplot as plt

# Bar chart for churn prediction distribution
plt.bar(['Stayed', 'Churned'], [(y_pred == 0).sum(), (y_pred == 1).sum()])
plt.title('Customer Churn Prediction Results')
plt.ylabel('Number of Customers')
plt.show()

💡 Insights from AI Analytics

Even this simple project can give valuable insights such as:

Younger customers with lower tenure may have a higher chance of churning.
High monthly charges may correlate with churn.
Female customers churned slightly more in the sample.

📁 Summary

This Python AI analytics project covers:

Real-world problem: Churn prediction
Data preprocessing (label encoding)
Model training with Logistic Regression
Evaluation and visualization

You can scale this by:

Using a full telecom dataset (e.g., Kaggle’s “Telco Customer Churn”)
Trying different models (Random Forest, XGBoost)
Adding feature engineering (e.g., contract type, payment method)