Y
🧠
Module 2

Neural Networks & Deep LearningArchitecture & Training

⏱️ 12–18 hours📊 Intermediate🧩 3 Code Blocks🏗️ 1 Project

🎯 Learning Objectives

  • Understand what a neural network is, starting from a single neuron and building up layer by layer.
  • Learn activation functions, backpropagation, and optimizers — explained visually, not just mathematically.
  • Build your first neural network with TensorFlow/Keras and train it step by step.
  • Build CNNs for image classification and create a production-ready Customer Churn Prediction model.

📋 Prerequisites

Module 1 completedComfortable with Python and Scikit-Learn from Module 1No deep math background needed — we explain everything

📐 Technical Theory

🧠 What is a Neural Network? (The Big Picture)

Remember how in Module 1, Linear Regression was just: output = weight × input + bias? A neural network is simply many of these stacked together in layers! A single artificial neuron works exactly like your brain cells (roughly): 1. It receives inputs (like pixel values of an image) 2. Each input is multiplied by a weight (how important is this input?) 3. All weighted inputs are summed together 4. The sum passes through an "activation function" (a decision gate) 5. The output becomes the input for the next neuron Stack hundreds of these neurons in layers → that's a neural network! Stack many layers → that's DEEP learning.
output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + bias)

In plain English: output = activation(weighted_sum_of_all_inputs + bias)

Activation Functions — The Decision Gates

Why do we need activation functions? Without them, no matter how many layers you stack, the entire network would just be doing simple multiplication — it couldn't learn anything complex (like recognising a cat). Activation functions add "curves" to the model, letting it learn complex patterns. Think of them as decision gates that decide whether a neuron should "fire" or not. For beginners: just use ReLU in hidden layers and Sigmoid/Softmax in the output layer. That covers 90% of use cases!
FunctionFormulaRangeBest Used In
Sigmoid1/(1+e⁻ˣ)(0, 1)Binary output layer
Tanh(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)(-1, 1)Hidden layers (RNNs)
ReLUmax(0, x)[0, ∞)Hidden layers (default)
Leaky ReLUmax(0.01x, x)(-∞, ∞)Avoids dying ReLU
Softmaxeˣⁱ / Σeˣʲ(0,1), sums to 1Multi-class output

Backpropagation — How Does the Network Learn?

This is the magic behind deep learning. Here's the intuition (no calculus degree needed!): 1. Forward Pass: Feed data through the network and get a prediction 2. Calculate Error: Compare prediction to the correct answer — "how wrong were we?" 3. Backward Pass: Go backward through each layer asking "which weights caused this mistake?" 4. Update Weights: Adjust each weight a tiny bit to reduce the error 5. Repeat thousands of times It's like a teacher grading a test and then telling each student exactly which topics they need to study more. The "chain rule" from calculus is used to calculate how much each weight contributed — but TensorFlow handles all this math automatically!
∂L/∂W₁ = ∂L/∂ŷ × ∂ŷ/∂a₂ × ... × ∂z₁/∂W₁

Don't panic! TensorFlow calculates this automatically. You just call model.fit().

Optimizers — Choosing How to Learn

Remember Gradient Descent from Module 1? Optimizers are smarter versions of it. They control HOW the model updates its weights. For beginners: just use Adam. It's the default for a reason — it automatically adjusts the learning speed for each weight and works great 90% of the time. You can explore others as you gain experience.
OptimizerKey IdeaWhen to Use
SGDRaw gradient step + momentumSimple tasks, strong regularization
AdamAdapts LR per-parameterDefault for most networks
AdaGradReduces LR for frequent featuresSparse data (NLP)
RMSpropMoving avg of squared gradientsRNNs

Batch Normalization & Dropout — Tricks That Work

Two powerful techniques that make neural networks train better: Batch Normalization: Normalises the data flowing between layers so the numbers don't get too big or too small. Think of it as "resetting the scale" at each layer. Result: faster training and more stable. Dropout: Randomly turns off some neurons during training (e.g., 40% of them each step). Why? This forces the network to not rely on any single neuron — like a sports team where everyone needs to be good, not just one star player. This prevents overfitting!

💻 Code Implementation

Step 1: Building a Feedforward Neural Network

Step 1: Building a Feedforward Neural Network
python
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, callbacks
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

# ── Reproducibility ───────────────────────────────────────
tf.random.set_seed(42)
np.random.seed(42)
print(f"TensorFlow Version: {tf.__version__}")

# ── Generate Synthetic Dataset ────────────────────────────
X, y = make_classification(
    n_samples=10000, n_features=20, n_informative=15,
    n_redundant=5, n_classes=2, random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# ── Build the Model ───────────────────────────────────────
def build_model(input_dim, learning_rate=0.001):
    model = keras.Sequential([
        layers.Input(shape=(input_dim,)),
        layers.Dense(256, use_bias=False),
        layers.BatchNormalization(),
        layers.Activation("relu"),
        layers.Dropout(0.4),
        layers.Dense(128, use_bias=False),
        layers.BatchNormalization(),
        layers.Activation("relu"),
        layers.Dropout(0.3),
        layers.Dense(64, activation="relu"),
        layers.Dropout(0.2),
        layers.Dense(1, activation="sigmoid"),
    ], name="deep_classifier")

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="binary_crossentropy",
        metrics=["accuracy", keras.metrics.AUC(name="auc")]
    )
    return model

model = build_model(input_dim=X_train.shape[1])
model.summary()

🔧 Troubleshooting

❌ Error:ModuleNotFoundError: No module named "tensorflow"🔍 Cause:TensorFlow not installed✅ Fix:pip install tensorflow
❌ Error:CUDA out of memory🔍 Cause:GPU memory exhausted✅ Fix:Reduce batch_size, use tf.keras.mixed_precision

Step 2: Training with Callbacks

Step 2: Training with Callbacks
python
# ── Define Callbacks ──────────────────────────────────────
cb_early_stop = callbacks.EarlyStopping(
    monitor="val_loss", patience=15,
    restore_best_weights=True, verbose=1
)
cb_reduce_lr = callbacks.ReduceLROnPlateau(
    monitor="val_loss", factor=0.5,
    patience=7, min_lr=1e-6, verbose=1
)
cb_checkpoint = callbacks.ModelCheckpoint(
    filepath="best_model.keras",
    monitor="val_auc", save_best_only=True,
    mode="max", verbose=1
)

# ── Train ─────────────────────────────────────────────────
history = model.fit(
    X_train, y_train,
    epochs=200, batch_size=256,
    validation_split=0.15,
    callbacks=[cb_early_stop, cb_reduce_lr, cb_checkpoint],
    verbose=1
)

# ── Evaluate ──────────────────────────────────────────────
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test, verbose=0)
print(f"\n🏆 Test Accuracy : {test_acc:.4f}")
print(f"🏆 Test AUC      : {test_auc:.4f}")

# ── Plot Training Curves ──────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for ax, metric, title in zip(axes, ["loss","accuracy","auc"], ["Loss","Accuracy","AUC"]):
    ax.plot(history.history[metric],        label="Train", linewidth=2)
    ax.plot(history.history[f"val_{metric}"], label="Val", linewidth=2, linestyle="--")
    ax.set_title(f"Training {title}", fontsize=13)
    ax.set_xlabel("Epoch"); ax.legend(); ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("training_curves.png", dpi=150)
plt.show()

🔧 Troubleshooting

❌ Error:Loss is NaN from epoch 1🔍 Cause:Exploding gradients or bad LR✅ Fix:Use clipnorm=1.0 in optimizer, lower learning rate
❌ Error:Validation loss increases immediately🔍 Cause:Overfitting✅ Fix:Increase Dropout rate, add L2 regularization

Step 3: CNN for Image Classification (CIFAR-10)

Step 3: CNN for Image Classification (CIFAR-10)
python
from tensorflow.keras.datasets import cifar10

# ── Load and Normalize ────────────────────────────────────
(X_train_c, y_train_c), (X_test_c, y_test_c) = cifar10.load_data()
X_train_c = X_train_c.astype("float32") / 255.0
X_test_c  = X_test_c.astype("float32")  / 255.0

CLASS_NAMES = [
    "airplane","automobile","bird","cat","deer",
    "dog","frog","horse","ship","truck"
]

# ── Build CNN ─────────────────────────────────────────────
def build_cnn():
    model = keras.Sequential([
        layers.Input(shape=(32, 32, 3)),
        layers.Conv2D(32, (3,3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.Conv2D(32, (3,3), padding="same", activation="relu"),
        layers.MaxPooling2D((2,2)),
        layers.Dropout(0.25),

        layers.Conv2D(64, (3,3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3,3), padding="same", activation="relu"),
        layers.MaxPooling2D((2,2)),
        layers.Dropout(0.25),

        layers.Conv2D(128, (3,3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.GlobalAveragePooling2D(),

        layers.Dense(256, activation="relu"),
        layers.Dropout(0.5),
        layers.Dense(10, activation="softmax"),
    ], name="cifar10_cnn")

    model.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

cnn = build_cnn()
cnn.summary()

# ── Data Augmentation ─────────────────────────────────────
data_aug = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
    layers.RandomTranslation(0.1, 0.1),
], name="augmentation")

# ── Train ─────────────────────────────────────────────────
cnn_history = cnn.fit(
    data_aug(X_train_c), y_train_c,
    epochs=50, batch_size=128,
    validation_data=(X_test_c, y_test_c),
    callbacks=[
        callbacks.EarlyStopping(monitor="val_accuracy", patience=10, restore_best_weights=True),
        callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=5)
    ], verbose=1
)
_, test_accuracy = cnn.evaluate(X_test_c, y_test_c, verbose=0)
print(f"\n🏆 CIFAR-10 Test Accuracy: {test_accuracy * 100:.2f}%")

🔧 Troubleshooting

❌ Error:ResourceExhaustedError🔍 Cause:Dataset too large for RAM✅ Fix:Use tf.data.Dataset with .prefetch() and .cache()
❌ Error:Training very slow on CPU🔍 Cause:No GPU available✅ Fix:Reduce model size, use smaller batches, try Google Colab

🏗️ Practical Project

Customer Churn Prediction

Build a deep learning model that predicts whether a telecom customer will leave. Handles class imbalance with class weights and uses a production-ready inference function with risk classification.

Customer Churn Prediction
python
import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, callbacks
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report

# ── Simulated Telco Churn Dataset ─────────────────────────
np.random.seed(42)
n = 7000
df = pd.DataFrame({
    "tenure": np.random.randint(1, 72, n),
    "monthly_charges": np.round(np.random.uniform(20, 120, n), 2),
    "total_charges": np.round(np.random.uniform(100, 8000, n), 2),
    "num_products": np.random.randint(1, 6, n),
    "tech_support": np.random.choice([0, 1], n),
    "online_backup": np.random.choice([0, 1], n),
    "senior_citizen": np.random.choice([0, 1], n, p=[0.84, 0.16]),
    "contract_type": np.random.choice(["Month","One_Year","Two_Year"], n),
    "internet_service": np.random.choice(["DSL","Fiber","No"], n),
    "churn": ((np.random.uniform(0,1,n) < 0.05) |
              (np.random.uniform(20,120,n) > 80) &
              (np.random.randint(1,72,n) < 20)).astype(int)
})

le = LabelEncoder()
df["contract_type"]    = le.fit_transform(df["contract_type"])
df["internet_service"] = le.fit_transform(df["internet_service"])
df["charge_per_month"] = df["total_charges"] / (df["tenure"] + 1)

X = df.drop("churn", axis=1).values
y = df["churn"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# ── Class Weights for Imbalanced Data ─────────────────────
neg, pos = np.bincount(y_train)
class_weight = {0: (1/neg)*(len(y_train)/2.0), 1: (1/pos)*(len(y_train)/2.0)}

# ── Build Model ───────────────────────────────────────────
churn_model = keras.Sequential([
    layers.Input(shape=(X_train.shape[1],)),
    layers.Dense(128, use_bias=False),
    layers.BatchNormalization(),
    layers.Activation("relu"),
    layers.Dropout(0.35),
    layers.Dense(64, use_bias=False),
    layers.BatchNormalization(),
    layers.Activation("relu"),
    layers.Dropout(0.25),
    layers.Dense(32, activation="relu"),
    layers.Dense(1, activation="sigmoid"),
], name="churn_predictor")

churn_model.compile(
    optimizer=keras.optimizers.Adam(5e-4),
    loss="binary_crossentropy",
    metrics=["accuracy", keras.metrics.AUC(name="auc")]
)

# ── Train ─────────────────────────────────────────────────
history = churn_model.fit(
    X_train, y_train, epochs=100, batch_size=64,
    validation_split=0.15, class_weight=class_weight,
    callbacks=[callbacks.EarlyStopping(
        monitor="val_auc", patience=15,
        restore_best_weights=True, mode="max"
    )], verbose=0
)

y_prob = churn_model.predict(X_test).flatten()
y_pred = (y_prob >= 0.50).astype(int)
print("\n📊 Classification Report:")
print(classification_report(y_test, y_pred, target_names=["No Churn", "Churn"]))

# ── Production Inference ──────────────────────────────────
def predict_churn(customer_features: np.ndarray) -> dict:
    scaled = scaler.transform(customer_features.reshape(1, -1))
    prob = float(churn_model.predict(scaled, verbose=0)[0][0])
    risk = "🔴 HIGH" if prob > 0.70 else "🟡 MEDIUM" if prob > 0.40 else "🟢 LOW"
    return {"churn_probability": round(prob, 4), "risk_level": risk}

sample = X_test[0]
result = predict_churn(scaler.inverse_transform(sample.reshape(1,-1))[0])
print(f"\nChurn Probability: {result['churn_probability']:.1%}")
print(f"Risk: {result['risk_level']}")

🔧 Troubleshooting

❌ Error:Accuracy stuck at ~50%🔍 Cause:Class imbalance✅ Fix:Verify class_weight is being passed correctly
❌ Error:Model accuracy stuck at class distribution🔍 Cause:Model predicts majority class only✅ Fix:Balance dataset, use weighted loss, check label encoding

Topics Covered

#Neural Networks#CNN#Backpropagation#TensorFlow#Keras#Adam