🧰 Technologies Used:
- Pandas (Data handling)
- Scikit-learn (Machine learning)
- Matplotlib (Visualization)
- Logistic Regression (Model)
📦 Step 1: Install Required Libraries
Make sure to install the required Python packages:
bash ------- pip install pandas scikit-learn matplotlib
📁 Step 2: Load and Explore the Dataset
We’ll use a simplified version of a customer churn dataset (you can replace this with any CSV dataset you have).
python
-------
import pandas as pd
# Sample dataset
data = {
'Gender': ['Male', 'Female', 'Female', 'Male', 'Female'],
'Age': [34, 40, 23, 52, 30],
'MonthlyCharges': [70.0, 89.5, 29.8, 56.2, 45.5],
'Tenure': [12, 45, 2, 22, 10],
'Churn': [0, 0, 1, 0, 1] # 1 = churned, 0 = stayed
}
df = pd.DataFrame(data)
print(df.head())
🔍 Step 3: Preprocess the Data
Convert categorical data to numeric and split into features and target.
python ------- from sklearn.preprocessing import LabelEncoder # Encode categorical columns label_encoder = LabelEncoder() df['Gender'] = label_encoder.fit_transform(df['Gender']) # Male=1, Female=0 # Define features and target X = df[['Gender', 'Age', 'MonthlyCharges', 'Tenure']] y = df['Churn']
⚙️ Step 4: Train the Machine Learning Model
python
-------
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Train the model
model = LogisticRegression()
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
📊 Step 5: Visualize the Predictions
python
-------
import matplotlib.pyplot as plt
# Bar chart for churn prediction distribution
plt.bar(['Stayed', 'Churned'], [(y_pred == 0).sum(), (y_pred == 1).sum()])
plt.title('Customer Churn Prediction Results')
plt.ylabel('Number of Customers')
plt.show()
💡 Insights from AI Analytics
Even this simple project can give valuable insights such as:
- Younger customers with lower tenure may have a higher chance of churning.
- High monthly charges may correlate with churn.
- Female customers churned slightly more in the sample.
📁 Summary
This Python AI analytics project covers:
- Real-world problem: Churn prediction
- Data preprocessing (label encoding)
- Model training with Logistic Regression
- Evaluation and visualization
You can scale this by:
- Using a full telecom dataset (e.g., Kaggle’s “Telco Customer Churn”)
- Trying different models (Random Forest, XGBoost)
- Adding feature engineering (e.g., contract type, payment method)

