Neural Networks & Deep LearningArchitecture & Training

⏱️ 12–18 hours📊 Intermediate🧩 3 Code Blocks🏗️ 1 Project

🎯 Learning Objectives

✓Understand what a neural network is, starting from a single neuron and building up layer by layer.
✓Learn activation functions, backpropagation, and optimizers — explained visually, not just mathematically.
✓Build your first neural network with TensorFlow/Keras and train it step by step.
✓Build CNNs for image classification and create a production-ready Customer Churn Prediction model.

📋 Prerequisites

Module 1 completedComfortable with Python and Scikit-Learn from Module 1No deep math background needed — we explain everything

📐 Technical Theory

🧠 What is a Neural Network? (The Big Picture)

Remember how in Module 1, Linear Regression was just: output = weight × input + bias? A neural network is simply many of these stacked together in layers! A single artificial neuron works exactly like your brain cells (roughly): 1. It receives inputs (like pixel values of an image) 2. Each input is multiplied by a weight (how important is this input?) 3. All weighted inputs are summed together 4. The sum passes through an "activation function" (a decision gate) 5. The output becomes the input for the next neuron Stack hundreds of these neurons in layers → that's a neural network! Stack many layers → that's DEEP learning.

output = activation(w₁x₁ + w₂x₂ + ... + wₙxₙ + bias)

In plain English: output = activation(weighted_sum_of_all_inputs + bias)

Activation Functions — The Decision Gates

Why do we need activation functions? Without them, no matter how many layers you stack, the entire network would just be doing simple multiplication — it couldn't learn anything complex (like recognising a cat). Activation functions add "curves" to the model, letting it learn complex patterns. Think of them as decision gates that decide whether a neuron should "fire" or not. For beginners: just use ReLU in hidden layers and Sigmoid/Softmax in the output layer. That covers 90% of use cases!

Function	Formula	Range	Best Used In
Sigmoid	1/(1+e⁻ˣ)	(0, 1)	Binary output layer
Tanh	(eˣ-e⁻ˣ)/(eˣ+e⁻ˣ)	(-1, 1)	Hidden layers (RNNs)
ReLU	max(0, x)	[0, ∞)	Hidden layers (default)
Leaky ReLU	max(0.01x, x)	(-∞, ∞)	Avoids dying ReLU
Softmax	eˣⁱ / Σeˣʲ	(0,1), sums to 1	Multi-class output

Backpropagation — How Does the Network Learn?

This is the magic behind deep learning. Here's the intuition (no calculus degree needed!): 1. Forward Pass: Feed data through the network and get a prediction 2. Calculate Error: Compare prediction to the correct answer — "how wrong were we?" 3. Backward Pass: Go backward through each layer asking "which weights caused this mistake?" 4. Update Weights: Adjust each weight a tiny bit to reduce the error 5. Repeat thousands of times It's like a teacher grading a test and then telling each student exactly which topics they need to study more. The "chain rule" from calculus is used to calculate how much each weight contributed — but TensorFlow handles all this math automatically!

∂L/∂W₁ = ∂L/∂ŷ × ∂ŷ/∂a₂ × ... × ∂z₁/∂W₁

Don't panic! TensorFlow calculates this automatically. You just call model.fit().

Optimizers — Choosing How to Learn

Remember Gradient Descent from Module 1? Optimizers are smarter versions of it. They control HOW the model updates its weights. For beginners: just use Adam. It's the default for a reason — it automatically adjusts the learning speed for each weight and works great 90% of the time. You can explore others as you gain experience.

Optimizer	Key Idea	When to Use
SGD	Raw gradient step + momentum	Simple tasks, strong regularization
Adam	Adapts LR per-parameter	Default for most networks
AdaGrad	Reduces LR for frequent features	Sparse data (NLP)
RMSprop	Moving avg of squared gradients	RNNs

Batch Normalization & Dropout — Tricks That Work

Two powerful techniques that make neural networks train better: Batch Normalization: Normalises the data flowing between layers so the numbers don't get too big or too small. Think of it as "resetting the scale" at each layer. Result: faster training and more stable. Dropout: Randomly turns off some neurons during training (e.g., 40% of them each step). Why? This forces the network to not rely on any single neuron — like a sports team where everyone needs to be good, not just one star player. This prevents overfitting!

💻 Code Implementation

Step 1: Building a Feedforward Neural Network

python

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, callbacks
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification

# ── Reproducibility ───────────────────────────────────────
tf.random.set_seed(42)
np.random.seed(42)
print(f"TensorFlow Version: {tf.__version__}")

# ── Generate Synthetic Dataset ────────────────────────────
X, y = make_classification(
    n_samples=10000, n_features=20, n_informative=15,
    n_redundant=5, n_classes=2, random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# ── Build the Model ───────────────────────────────────────
def build_model(input_dim, learning_rate=0.001):
    model = keras.Sequential([
        layers.Input(shape=(input_dim,)),
        layers.Dense(256, use_bias=False),
        layers.BatchNormalization(),
        layers.Activation("relu"),
        layers.Dropout(0.4),
        layers.Dense(128, use_bias=False),
        layers.BatchNormalization(),
        layers.Activation("relu"),
        layers.Dropout(0.3),
        layers.Dense(64, activation="relu"),
        layers.Dropout(0.2),
        layers.Dense(1, activation="sigmoid"),
    ], name="deep_classifier")

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=learning_rate),
        loss="binary_crossentropy",
        metrics=["accuracy", keras.metrics.AUC(name="auc")]
    )
    return model

model = build_model(input_dim=X_train.shape[1])
model.summary()

🔧 Troubleshooting

❌ Error:ModuleNotFoundError: No module named "tensorflow"🔍 Cause:TensorFlow not installed✅ Fix:pip install tensorflow

❌ Error:CUDA out of memory🔍 Cause:GPU memory exhausted✅ Fix:Reduce batch_size, use tf.keras.mixed_precision

Step 2: Training with Callbacks

python

# ── Define Callbacks ──────────────────────────────────────
cb_early_stop = callbacks.EarlyStopping(
    monitor="val_loss", patience=15,
    restore_best_weights=True, verbose=1
)
cb_reduce_lr = callbacks.ReduceLROnPlateau(
    monitor="val_loss", factor=0.5,
    patience=7, min_lr=1e-6, verbose=1
)
cb_checkpoint = callbacks.ModelCheckpoint(
    filepath="best_model.keras",
    monitor="val_auc", save_best_only=True,
    mode="max", verbose=1
)

# ── Train ─────────────────────────────────────────────────
history = model.fit(
    X_train, y_train,
    epochs=200, batch_size=256,
    validation_split=0.15,
    callbacks=[cb_early_stop, cb_reduce_lr, cb_checkpoint],
    verbose=1
)

# ── Evaluate ──────────────────────────────────────────────
test_loss, test_acc, test_auc = model.evaluate(X_test, y_test, verbose=0)
print(f"\n🏆 Test Accuracy : {test_acc:.4f}")
print(f"🏆 Test AUC      : {test_auc:.4f}")

# ── Plot Training Curves ──────────────────────────────────
fig, axes = plt.subplots(1, 3, figsize=(18, 5))
for ax, metric, title in zip(axes, ["loss","accuracy","auc"], ["Loss","Accuracy","AUC"]):
    ax.plot(history.history[metric],        label="Train", linewidth=2)
    ax.plot(history.history[f"val_{metric}"], label="Val", linewidth=2, linestyle="--")
    ax.set_title(f"Training {title}", fontsize=13)
    ax.set_xlabel("Epoch"); ax.legend(); ax.grid(alpha=0.3)
plt.tight_layout()
plt.savefig("training_curves.png", dpi=150)
plt.show()

🔧 Troubleshooting

❌ Error:Loss is NaN from epoch 1🔍 Cause:Exploding gradients or bad LR✅ Fix:Use clipnorm=1.0 in optimizer, lower learning rate

❌ Error:Validation loss increases immediately🔍 Cause:Overfitting✅ Fix:Increase Dropout rate, add L2 regularization

Step 3: CNN for Image Classification (CIFAR-10)

python

from tensorflow.keras.datasets import cifar10

# ── Load and Normalize ────────────────────────────────────
(X_train_c, y_train_c), (X_test_c, y_test_c) = cifar10.load_data()
X_train_c = X_train_c.astype("float32") / 255.0
X_test_c  = X_test_c.astype("float32")  / 255.0

CLASS_NAMES = [
    "airplane","automobile","bird","cat","deer",
    "dog","frog","horse","ship","truck"
]

# ── Build CNN ─────────────────────────────────────────────
def build_cnn():
    model = keras.Sequential([
        layers.Input(shape=(32, 32, 3)),
        layers.Conv2D(32, (3,3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.Conv2D(32, (3,3), padding="same", activation="relu"),
        layers.MaxPooling2D((2,2)),
        layers.Dropout(0.25),

        layers.Conv2D(64, (3,3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.Conv2D(64, (3,3), padding="same", activation="relu"),
        layers.MaxPooling2D((2,2)),
        layers.Dropout(0.25),

        layers.Conv2D(128, (3,3), padding="same", activation="relu"),
        layers.BatchNormalization(),
        layers.GlobalAveragePooling2D(),

        layers.Dense(256, activation="relu"),
        layers.Dropout(0.5),
        layers.Dense(10, activation="softmax"),
    ], name="cifar10_cnn")

    model.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss="sparse_categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

cnn = build_cnn()
cnn.summary()

# ── Data Augmentation ─────────────────────────────────────
data_aug = keras.Sequential([
    layers.RandomFlip("horizontal"),
    layers.RandomRotation(0.1),
    layers.RandomZoom(0.1),
    layers.RandomTranslation(0.1, 0.1),
], name="augmentation")

# ── Train ─────────────────────────────────────────────────
cnn_history = cnn.fit(
    data_aug(X_train_c), y_train_c,
    epochs=50, batch_size=128,
    validation_data=(X_test_c, y_test_c),
    callbacks=[
        callbacks.EarlyStopping(monitor="val_accuracy", patience=10, restore_best_weights=True),
        callbacks.ReduceLROnPlateau(monitor="val_loss", factor=0.5, patience=5)
    ], verbose=1
)
_, test_accuracy = cnn.evaluate(X_test_c, y_test_c, verbose=0)
print(f"\n🏆 CIFAR-10 Test Accuracy: {test_accuracy * 100:.2f}%")

🔧 Troubleshooting

❌ Error:ResourceExhaustedError🔍 Cause:Dataset too large for RAM✅ Fix:Use tf.data.Dataset with .prefetch() and .cache()

❌ Error:Training very slow on CPU🔍 Cause:No GPU available✅ Fix:Reduce model size, use smaller batches, try Google Colab

🏗️ Practical Project

Customer Churn Prediction

Build a deep learning model that predicts whether a telecom customer will leave. Handles class imbalance with class weights and uses a production-ready inference function with risk classification.

Customer Churn Prediction

python

import pandas as pd
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, callbacks
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report

# ── Simulated Telco Churn Dataset ─────────────────────────
np.random.seed(42)
n = 7000
df = pd.DataFrame({
    "tenure": np.random.randint(1, 72, n),
    "monthly_charges": np.round(np.random.uniform(20, 120, n), 2),
    "total_charges": np.round(np.random.uniform(100, 8000, n), 2),
    "num_products": np.random.randint(1, 6, n),
    "tech_support": np.random.choice([0, 1], n),
    "online_backup": np.random.choice([0, 1], n),
    "senior_citizen": np.random.choice([0, 1], n, p=[0.84, 0.16]),
    "contract_type": np.random.choice(["Month","One_Year","Two_Year"], n),
    "internet_service": np.random.choice(["DSL","Fiber","No"], n),
    "churn": ((np.random.uniform(0,1,n) < 0.05) |
              (np.random.uniform(20,120,n) > 80) &
              (np.random.randint(1,72,n) < 20)).astype(int)
})

le = LabelEncoder()
df["contract_type"]    = le.fit_transform(df["contract_type"])
df["internet_service"] = le.fit_transform(df["internet_service"])
df["charge_per_month"] = df["total_charges"] / (df["tenure"] + 1)

X = df.drop("churn", axis=1).values
y = df["churn"].values

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test  = scaler.transform(X_test)

# ── Class Weights for Imbalanced Data ─────────────────────
neg, pos = np.bincount(y_train)
class_weight = {0: (1/neg)*(len(y_train)/2.0), 1: (1/pos)*(len(y_train)/2.0)}

# ── Build Model ───────────────────────────────────────────
churn_model = keras.Sequential([
    layers.Input(shape=(X_train.shape[1],)),
    layers.Dense(128, use_bias=False),
    layers.BatchNormalization(),
    layers.Activation("relu"),
    layers.Dropout(0.35),
    layers.Dense(64, use_bias=False),
    layers.BatchNormalization(),
    layers.Activation("relu"),
    layers.Dropout(0.25),
    layers.Dense(32, activation="relu"),
    layers.Dense(1, activation="sigmoid"),
], name="churn_predictor")

churn_model.compile(
    optimizer=keras.optimizers.Adam(5e-4),
    loss="binary_crossentropy",
    metrics=["accuracy", keras.metrics.AUC(name="auc")]
)

# ── Train ─────────────────────────────────────────────────
history = churn_model.fit(
    X_train, y_train, epochs=100, batch_size=64,
    validation_split=0.15, class_weight=class_weight,
    callbacks=[callbacks.EarlyStopping(
        monitor="val_auc", patience=15,
        restore_best_weights=True, mode="max"
    )], verbose=0
)

y_prob = churn_model.predict(X_test).flatten()
y_pred = (y_prob >= 0.50).astype(int)
print("\n📊 Classification Report:")
print(classification_report(y_test, y_pred, target_names=["No Churn", "Churn"]))

# ── Production Inference ──────────────────────────────────
def predict_churn(customer_features: np.ndarray) -> dict:
    scaled = scaler.transform(customer_features.reshape(1, -1))
    prob = float(churn_model.predict(scaled, verbose=0)[0][0])
    risk = "🔴 HIGH" if prob > 0.70 else "🟡 MEDIUM" if prob > 0.40 else "🟢 LOW"
    return {"churn_probability": round(prob, 4), "risk_level": risk}

sample = X_test[0]
result = predict_churn(scaler.inverse_transform(sample.reshape(1,-1))[0])
print(f"\nChurn Probability: {result['churn_probability']:.1%}")
print(f"Risk: {result['risk_level']}")

🔧 Troubleshooting

❌ Error:Accuracy stuck at ~50%🔍 Cause:Class imbalance✅ Fix:Verify class_weight is being passed correctly

❌ Error:Model accuracy stuck at class distribution🔍 Cause:Model predicts majority class only✅ Fix:Balance dataset, use weighted loss, check label encoding

Topics Covered

#Neural Networks#CNN#Backpropagation#TensorFlow#Keras#Adam