🧠
Module 2
Neural Networks & Deep LearningArchitecture & Training
⏱️ 12–18 hours📊 Intermediate🧩 3 Code Blocks🏗️ 1 Project
🎯 Learning Objectives
- ✓Understand what a neural network is, starting from a single neuron and building up layer by layer.
- ✓Learn activation functions, backpropagation, and optimizers — explained visually, not just mathematically.
- ✓Build your first neural network with TensorFlow/Keras and train it step by step.
- ✓Build CNNs for image classification and create a production-ready Customer Churn Prediction model.
📋 Prerequisites
Module 1 completedComfortable with Python and Scikit-Learn from Module 1No deep math background needed — we explain everything
📐 Technical Theory
🧠 What is a Neural Network? (The Big Picture)
Remember how in Module 1, Linear Regression was just: output = weight × input + bias? A neural network is simply many of these stacked together in layers!
A single artificial neuron works exactly like your brain cells (roughly):
1. It receives inputs (like pixel values of an image)
2. Each input is multiplied by a weight (how important is this input?)
3. All weighted inputs are summed together
4. The sum passes through an "activation function" (a decision gate)
5. The output becomes the input for the next neuron
Stack hundreds of these neurons in layers → that's a neural network!
Stack many layers → that's DEEP learning.
output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + bias) In plain English: output = activation(weighted_sum_of_all_inputs + bias)
Activation Functions — The Decision Gates
Why do we need activation functions? Without them, no matter how many layers you stack, the entire network would just be doing simple multiplication — it couldn't learn anything complex (like recognising a cat).
Activation functions add "curves" to the model, letting it learn complex patterns. Think of them as decision gates that decide whether a neuron should "fire" or not.
For beginners: just use ReLU in hidden layers and Sigmoid/Softmax in the output layer. That covers 90% of use cases!
| Function | Formula | Range | Best Used In |
|---|---|---|---|
| Sigmoid | 1/(1+e⁻ˣ) | (0, 1) | Binary output layer |
| Tanh | (eˣ-e⁻ˣ)/(eˣ+e⁻ˣ) | (-1, 1) | Hidden layers (RNNs) |
| ReLU | max(0, x) | [0, ∞) | Hidden layers (default) |
| Leaky ReLU | max(0.01x, x) | (-∞, ∞) | Avoids dying ReLU |
| Softmax | eˣⁱ / Σeˣʲ | (0,1), sums to 1 | Multi-class output |
Backpropagation — How Does the Network Learn?
This is the magic behind deep learning. Here's the intuition (no calculus degree needed!):
1. Forward Pass: Feed data through the network and get a prediction
2. Calculate Error: Compare prediction to the correct answer — "how wrong were we?"
3. Backward Pass: Go backward through each layer asking "which weights caused this mistake?"
4. Update Weights: Adjust each weight a tiny bit to reduce the error
5. Repeat thousands of times
It's like a teacher grading a test and then telling each student exactly which topics they need to study more. The "chain rule" from calculus is used to calculate how much each weight contributed — but TensorFlow handles all this math automatically!
∂L/∂W₁ = ∂L/∂ŷ × ∂ŷ/∂a₂ × ... × ∂z₁/∂W₁ Don't panic! TensorFlow calculates this automatically. You just call model.fit().
Optimizers — Choosing How to Learn
Remember Gradient Descent from Module 1? Optimizers are smarter versions of it. They control HOW the model updates its weights.
For beginners: just use Adam. It's the default for a reason — it automatically adjusts the learning speed for each weight and works great 90% of the time. You can explore others as you gain experience.
| Optimizer | Key Idea | When to Use |
|---|---|---|
| SGD | Raw gradient step + momentum | Simple tasks, strong regularization |
| Adam | Adapts LR per-parameter | Default for most networks |
| AdaGrad | Reduces LR for frequent features | Sparse data (NLP) |
| RMSprop | Moving avg of squared gradients | RNNs |
Batch Normalization & Dropout — Tricks That Work
Two powerful techniques that make neural networks train better:
Batch Normalization: Normalises the data flowing between layers so the numbers don't get too big or too small. Think of it as "resetting the scale" at each layer. Result: faster training and more stable.
Dropout: Randomly turns off some neurons during training (e.g., 40% of them each step). Why? This forces the network to not rely on any single neuron — like a sports team where everyone needs to be good, not just one star player. This prevents overfitting!
💻 Code Implementation
Step 1: Building a Feedforward Neural Network
Step 1: Building a Feedforward Neural Network
python
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, callbacks
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification
# ── Reproducibility ───────────────────────────────────────
tf.random.set_seed(42)
np.random.seed(42)
print(f"TensorFlow Version: {tf.__version__}")
# ── Generate Synthetic Dataset ────────────────────────────
X, y = make_classification(
n_samples=10000, n_features=20, n_informative=15,
n_redundant=5, n_classes=2, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# ── Build the Model ───────────────────────────────────────
def build_model(input_dim, learning_rate=0.001):
model = keras.Sequential([
layers.Input(shape=(input_dim,)),
layers.Dense(256, use_bias=False),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.Dropout(0.4),
layers.Dense(128, use_bias=False),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.Dropout(0.3),
layers.Dense(64, activation="relu"),
layers.Dropout(0.2),
layers.Dense(1, activation="sigmoid"),
], name="deep_classifier")
model.compile(
optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
loss="binary_crossentropy",
metrics=["accuracy", keras.metrics.AUC(name="auc")]
)
return model
model = build_model(input_dim=X_train.shape[1])
model.summary()🔧 Troubleshooting
❌ Error:ModuleNotFoundError: No module named "tensorflow"🔍 Cause:TensorFlow not installed✅ Fix:pip install tensorflow
❌ Error:CUDA out of memory🔍 Cause:GPU memory exhausted✅ Fix:Reduce batch_size, use tf.keras.mixed_precision
Step 2: Training with Callbacks
Step 2: Training with Callbacks
python
# ── Define Callbacks ──────────────────────────────────────
cb_early_stop = callbacks.EarlyStopping(
monitor="val_loss", patience=15,
restore_best_weights=True, verbose=1
)
cb_reduce_lr = callbacks.ReduceLROnPlateau(
monitor="val_loss", factor=0.5,
patience=7, min_lr=1e-6, verbose=1
)
cb_checkpoint = callbacks.ModelCheckpoint(
filepath="best_model.keras",
monitor="val_auc", save_best_only=True,
mode="max", verbose=1
)
# ── Train ─────────────────────────────────────────────────
history = model.fit(
X_train, y_train,
epochs=200, batch_size=256,
validation_split=0.15,
callbacks=[cb_early_stop, cb_reduce_lr, cb_checkpoint],
verbose=1
)
# ── Evaluate ──────────────────────────────────────────────
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test, verbose=0)
print(f"\n🏆 Test Accuracy : {test_acc:.4f}")
print(f"🏆 Test AUC : {test_auc:.4f}")
# ── Plot Training Curves ──────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for ax, metric, title in zip(axes, ["loss","accuracy","auc"], ["Loss","Accuracy","AUC"]):
ax.plot(history.history[metric], label="Train", linewidth=2)
ax.plot(history.history[f"val_{metric}"], label="Val", linewidth=2, linestyle="--")
ax.set_title(f"Training {title}", fontsize=13)
ax.set_xlabel("Epoch"); ax.legend(); ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("training_curves.png", dpi=150)
plt.show()🔧 Troubleshooting
❌ Error:Loss is NaN from epoch 1🔍 Cause:Exploding gradients or bad LR✅ Fix:Use clipnorm=1.0 in optimizer, lower learning rate
❌ Error:Validation loss increases immediately🔍 Cause:Overfitting✅ Fix:Increase Dropout rate, add L2 regularization
Step 3: CNN for Image Classification (CIFAR-10)
Step 3: CNN for Image Classification (CIFAR-10)
python
from tensorflow.keras.datasets import cifar10
# ── Load and Normalize ────────────────────────────────────
(X_train_c, y_train_c), (X_test_c, y_test_c) = cifar10.load_data()
X_train_c = X_train_c.astype("float32") / 255.0
X_test_c = X_test_c.astype("float32") / 255.0
CLASS_NAMES = [
"airplane","automobile","bird","cat","deer",
"dog","frog","horse","ship","truck"
]
# ── Build CNN ─────────────────────────────────────────────
def build_cnn():
model = keras.Sequential([
layers.Input(shape=(32, 32, 3)),
layers.Conv2D(32, (3,3), padding="same", activation="relu"),
layers.BatchNormalization(),
layers.Conv2D(32, (3,3), padding="same", activation="relu"),
layers.MaxPooling2D((2,2)),
layers.Dropout(0.25),
layers.Conv2D(64, (3,3), padding="same", activation="relu"),
layers.BatchNormalization(),
layers.Conv2D(64, (3,3), padding="same", activation="relu"),
layers.MaxPooling2D((2,2)),
layers.Dropout(0.25),
layers.Conv2D(128, (3,3), padding="same", activation="relu"),
layers.BatchNormalization(),
layers.GlobalAveragePooling2D(),
layers.Dense(256, activation="relu"),
layers.Dropout(0.5),
layers.Dense(10, activation="softmax"),
], name="cifar10_cnn")
model.compile(
optimizer=keras.optimizers.Adam(1e-3),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
return model
cnn = build_cnn()
cnn.summary()
# ── Data Augmentation ─────────────────────────────────────
data_aug = keras.Sequential([
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.1),
layers.RandomTranslation(0.1, 0.1),
], name="augmentation")
# ── Train ─────────────────────────────────────────────────
cnn_history = cnn.fit(
data_aug(X_train_c), y_train_c,
epochs=50, batch_size=128,
validation_data=(X_test_c, y_test_c),
callbacks=[
callbacks.EarlyStopping(monitor="val_accuracy", patience=10, restore_best_weights=True),
callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=5)
], verbose=1
)
_, test_accuracy = cnn.evaluate(X_test_c, y_test_c, verbose=0)
print(f"\n🏆 CIFAR-10 Test Accuracy: {test_accuracy * 100:.2f}%")🔧 Troubleshooting
❌ Error:ResourceExhaustedError🔍 Cause:Dataset too large for RAM✅ Fix:Use tf.data.Dataset with .prefetch() and .cache()
❌ Error:Training very slow on CPU🔍 Cause:No GPU available✅ Fix:Reduce model size, use smaller batches, try Google Colab
🏗️ Practical Project
Customer Churn Prediction
Build a deep learning model that predicts whether a telecom customer will leave. Handles class imbalance with class weights and uses a production-ready inference function with risk classification.
Customer Churn Prediction
python
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, callbacks
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report
# ── Simulated Telco Churn Dataset ─────────────────────────
np.random.seed(42)
n = 7000
df = pd.DataFrame({
"tenure": np.random.randint(1, 72, n),
"monthly_charges": np.round(np.random.uniform(20, 120, n), 2),
"total_charges": np.round(np.random.uniform(100, 8000, n), 2),
"num_products": np.random.randint(1, 6, n),
"tech_support": np.random.choice([0, 1], n),
"online_backup": np.random.choice([0, 1], n),
"senior_citizen": np.random.choice([0, 1], n, p=[0.84, 0.16]),
"contract_type": np.random.choice(["Month","One_Year","Two_Year"], n),
"internet_service": np.random.choice(["DSL","Fiber","No"], n),
"churn": ((np.random.uniform(0,1,n) < 0.05) |
(np.random.uniform(20,120,n) > 80) &
(np.random.randint(1,72,n) < 20)).astype(int)
})
le = LabelEncoder()
df["contract_type"] = le.fit_transform(df["contract_type"])
df["internet_service"] = le.fit_transform(df["internet_service"])
df["charge_per_month"] = df["total_charges"] / (df["tenure"] + 1)
X = df.drop("churn", axis=1).values
y = df["churn"].values
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# ── Class Weights for Imbalanced Data ─────────────────────
neg, pos = np.bincount(y_train)
class_weight = {0: (1/neg)*(len(y_train)/2.0), 1: (1/pos)*(len(y_train)/2.0)}
# ── Build Model ───────────────────────────────────────────
churn_model = keras.Sequential([
layers.Input(shape=(X_train.shape[1],)),
layers.Dense(128, use_bias=False),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.Dropout(0.35),
layers.Dense(64, use_bias=False),
layers.BatchNormalization(),
layers.Activation("relu"),
layers.Dropout(0.25),
layers.Dense(32, activation="relu"),
layers.Dense(1, activation="sigmoid"),
], name="churn_predictor")
churn_model.compile(
optimizer=keras.optimizers.Adam(5e-4),
loss="binary_crossentropy",
metrics=["accuracy", keras.metrics.AUC(name="auc")]
)
# ── Train ─────────────────────────────────────────────────
history = churn_model.fit(
X_train, y_train, epochs=100, batch_size=64,
validation_split=0.15, class_weight=class_weight,
callbacks=[callbacks.EarlyStopping(
monitor="val_auc", patience=15,
restore_best_weights=True, mode="max"
)], verbose=0
)
y_prob = churn_model.predict(X_test).flatten()
y_pred = (y_prob >= 0.50).astype(int)
print("\n📊 Classification Report:")
print(classification_report(y_test, y_pred, target_names=["No Churn", "Churn"]))
# ── Production Inference ──────────────────────────────────
def predict_churn(customer_features: np.ndarray) -> dict:
scaled = scaler.transform(customer_features.reshape(1, -1))
prob = float(churn_model.predict(scaled, verbose=0)[0][0])
risk = "🔴 HIGH" if prob > 0.70 else "🟡 MEDIUM" if prob > 0.40 else "🟢 LOW"
return {"churn_probability": round(prob, 4), "risk_level": risk}
sample = X_test[0]
result = predict_churn(scaler.inverse_transform(sample.reshape(1,-1))[0])
print(f"\nChurn Probability: {result['churn_probability']:.1%}")
print(f"Risk: {result['risk_level']}")🔧 Troubleshooting
❌ Error:Accuracy stuck at ~50%🔍 Cause:Class imbalance✅ Fix:Verify class_weight is being passed correctly
❌ Error:Model accuracy stuck at class distribution🔍 Cause:Model predicts majority class only✅ Fix:Balance dataset, use weighted loss, check label encoding
Topics Covered
#Neural Networks#CNN#Backpropagation#TensorFlow#Keras#Adam