Computer VisionImage Processing & Detection

⏱️ 10–14 hours📊 Advanced🧩 2 Code Blocks🏗️ 1 Project

🎯 Learning Objectives

✓Understand how computers "see" images — from pixels to features to objects.
✓Learn convolutional operations, transfer learning, and object detection — with visual explanations.
✓Use OpenCV for classical image processing and TensorFlow for deep learning on images.
✓Build a complete Real-Time Object Detection & Classification System.

📋 Prerequisites

Module 1, 2 & 3 completedComfortable with neural networks from Module 2You've come a long way — this is the final module!

📐 Technical Theory

2D Convolution — The Math

A 2D convolution slides a kernel K over an input I. Each filter learns to detect a specific pattern (edge, curve, texture). Stride controls how many pixels the filter moves; padding can maintain spatial dimensions.

(I * K)[i,j] = Σₘ Σₙ I[i+m, j+n] · K[m, n]
output_size = ⌊(input_size - kernel_size + 2p) / s⌋ + 1

What Each Layer Learns

CNN feature maps reveal a beautiful hierarchy: • Early layers (1-2): Horizontal/vertical edges, color gradients • Middle layers (3-5): Textures — grids, corners, circles • Deep layers (6+): High-level semantics — faces, wheels, eyes This hierarchical composition is why CNNs are so powerful for vision.

Transfer Learning — Standing on Giants' Shoulders

Training a ResNet-50 from scratch requires ~14 million labeled images and weeks of compute. Transfer learning instead: 1. Start with a pre-trained model (ImageNet weights) 2. Freeze the base convolutional layers 3. Replace the final classification layer 4. Fine-tune on your small domain-specific dataset This works because low-level features are universal.

ResNet — Skip Connections

Very deep networks suffer from vanishing gradients. The Residual Connection creates a highway for gradients: output = F(x) + x Instead of learning H(x) directly, the layers learn the residual F(x) = H(x) - x. If identity mapping is optimal, F(x) → 0. This enables training networks 1000+ layers deep.

output = F(x) + x

Object Detection Methods

YOLO (You Only Look Once) divides the image into a grid and predicts bounding boxes and class probabilities in a single forward pass — enabling real-time detection.

Task	Output	Example Method
Classification	Class label	ResNet, VGG
Localization	Class + bounding box	Two-head architecture
Object Detection	Multiple classes + boxes	YOLO, Faster R-CNN, SSD
Segmentation	Per-pixel class	U-Net, Mask R-CNN

💻 Code Implementation

Step 1: Image Processing with OpenCV

python

import cv2
import numpy as np
import matplotlib.pyplot as plt

def load_sample_image() -> np.ndarray:
    """Generate a synthetic test image."""
    img = np.zeros((400, 600, 3), dtype=np.uint8)
    cv2.rectangle(img, (50, 50), (200, 200), (0, 100, 255), -1)
    cv2.circle(img, (400, 200), 100, (0, 255, 100), -1)
    cv2.putText(img, "OpenCV Demo", (100, 350),
                cv2.FONT_HERSHEY_DUPLEX, 1.5, (255, 255, 255), 2)
    return img

img_bgr  = load_sample_image()
img_rgb  = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)

# ── Processing Operations ─────────────────────────────────
blurred   = cv2.GaussianBlur(img_gray, (15, 15), sigmaX=3)
edges     = cv2.Canny(img_gray, threshold1=50, threshold2=150)
sobel_x   = cv2.Sobel(img_gray, cv2.CV_64F, dx=1, dy=0, ksize=3)
sobel_y   = cv2.Sobel(img_gray, cv2.CV_64F, dx=0, dy=1, ksize=3)
sobel_mag = np.uint8(np.clip(
    np.sqrt(sobel_x**2 + sobel_y**2) / np.sqrt(sobel_x**2 + sobel_y**2).max() * 255, 0, 255
))

kernel  = np.ones((5, 5), np.uint8)
dilated = cv2.dilate(edges, kernel, iterations=2)
eroded  = cv2.erode(dilated, kernel, iterations=1)

# ── Visualization ─────────────────────────────────────────
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
images = [
    (img_rgb, "Original RGB", None),
    (img_gray, "Grayscale", "gray"),
    (blurred, "Gaussian Blur", "gray"),
    (edges, "Canny Edges", "gray"),
    (sobel_mag, "Sobel Magnitude", "gray"),
    (eroded, "Morphology", "gray"),
]
for ax, (image, title, cmap) in zip(axes.flat, images):
    ax.imshow(image, cmap=cmap)
    ax.set_title(title, fontsize=13, fontweight="bold")
    ax.axis("off")
plt.suptitle("Classical Computer Vision — OpenCV", fontsize=16, y=1.01)
plt.tight_layout()
plt.savefig("cv_operations.png", dpi=150, bbox_inches="tight")
plt.show()

🔧 Troubleshooting

❌ Error:cv2.error: (-215) !_src.empty()🔍 Cause:Image file not found or path wrong✅ Fix:Use os.path.exists() check; use raw strings r"C:\path" on Windows

❌ Error:ModuleNotFoundError: No module named "cv2"🔍 Cause:OpenCV not installed✅ Fix:pip install opencv-python

Step 2: Transfer Learning with ResNet50

python

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, applications, callbacks

IMG_SIZE    = (224, 224)
BATCH_SIZE  = 32
NUM_CLASSES = 5

def build_transfer_model(num_classes, base_trainable=False):
    """Two-phase transfer learning with ResNet50."""
    base_model = applications.ResNet50(
        weights="imagenet", include_top=False,
        input_shape=(*IMG_SIZE, 3)
    )
    base_model.trainable = base_trainable

    if base_trainable:
        for layer in base_model.layers[:-30]:
            layer.trainable = False

    inputs  = keras.Input(shape=(*IMG_SIZE, 3))
    x       = base_model(inputs, training=base_trainable)
    x       = layers.GlobalAveragePooling2D()(x)
    x       = layers.Dense(512, activation="relu")(x)
    x       = layers.BatchNormalization()(x)
    x       = layers.Dropout(0.4)(x)
    x       = layers.Dense(256, activation="relu")(x)
    x       = layers.Dropout(0.3)(x)
    outputs = layers.Dense(num_classes, activation="softmax")(x)

    model = keras.Model(inputs, outputs, name="resnet50_transfer")
    lr = 1e-3 if not base_trainable else 1e-5
    model.compile(
        optimizer=keras.optimizers.Adam(lr),
        loss="categorical_crossentropy",
        metrics=["accuracy"]
    )
    return model

# ── Build ─────────────────────────────────────────────────
phase1_model = build_transfer_model(NUM_CLASSES, base_trainable=False)
phase1_model.summary()

print("\n✅ ResNet50 Transfer Learning ready.")
print("   Provide a directory with class subfolders to train.")
print("   data/train/class_1/*.jpg, data/train/class_2/*.jpg, ...")

🔧 Troubleshooting

❌ Error:ResourceExhaustedError during CNN training🔍 Cause:GPU OOM with large images✅ Fix:Reduce IMG_SIZE, use batch_size=8, enable mixed precision

❌ Error:ValueError: Input 0 is incompatible🔍 Cause:Image shape mismatch✅ Fix:Ensure images are resized to (224, 224, 3)

🏗️ Practical Project

End-to-End Image Classification System

A production-ready image classifier using EfficientNetV2S with a multi-dropout head, inference-optimized pipeline, and batch processing support.

End-to-End Image Classification System

python

import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import applications, layers
from PIL import Image
import io, time

CLASSES  = ["cats", "dogs", "birds", "cars", "flowers"]
IMG_SIZE = (224, 224)

def build_production_classifier(num_classes):
    base = applications.EfficientNetV2S(
        weights="imagenet", include_top=False,
        input_shape=(*IMG_SIZE, 3),
    )
    base.trainable = False

    inputs = keras.Input(shape=(*IMG_SIZE, 3), name="image_input")
    x = applications.efficientnet_v2.preprocess_input(inputs)
    x = base(x, training=False)
    x = layers.GlobalAveragePooling2D()(x)
    x = layers.Dense(512, activation="relu")(x)
    x = layers.Dropout(0.4)(x)
    x = layers.Dense(num_classes, activation="softmax", name="predictions")(x)

    model = keras.Model(inputs, x, name="production_classifier")
    model.compile(
        optimizer=keras.optimizers.Adam(1e-3),
        loss="categorical_crossentropy", metrics=["accuracy"]
    )
    return model

def preprocess_image(image_input) -> np.ndarray:
    if isinstance(image_input, str):
        img = Image.open(image_input).convert("RGB")
    elif isinstance(image_input, bytes):
        img = Image.open(io.BytesIO(image_input)).convert("RGB")
    elif isinstance(image_input, np.ndarray):
        img = Image.fromarray(image_input.astype(np.uint8)).convert("RGB")
    else:
        img = image_input.convert("RGB")
    img = img.resize(IMG_SIZE, Image.LANCZOS)
    return np.expand_dims(np.array(img, dtype=np.float32), axis=0)

class ImageClassificationService:
    def __init__(self, model, class_names):
        self.model = model
        self.class_names = class_names
        self._warmup()

    def _warmup(self):
        dummy = np.zeros((1, *IMG_SIZE, 3), dtype=np.float32)
        _ = self.model.predict(dummy, verbose=0)
        print("✅ Model warmed up — ready for inference.")

    def predict_single(self, image_input, top_k=3):
        start = time.perf_counter()
        arr   = preprocess_image(image_input)
        probs = self.model.predict(arr, verbose=0)[0]
        elapsed = (time.perf_counter() - start) * 1000

        top_indices = np.argsort(probs)[::-1][:top_k]
        predictions = [{
            "rank": i+1,
            "class": self.class_names[idx],
            "confidence": float(probs[idx]),
            "percentage": f"{probs[idx]:.1%}",
        } for i, idx in enumerate(top_indices)]

        return {
            "top_prediction": predictions[0]["class"],
            "confidence": predictions[0]["confidence"],
            "all_predictions": predictions,
            "latency_ms": round(elapsed, 2),
        }

# ── Initialize & Demo ────────────────────────────────────
model   = build_production_classifier(len(CLASSES))
service = ImageClassificationService(model, CLASSES)

dummy_image = np.random.randint(0, 255, (*IMG_SIZE, 3), dtype=np.uint8)
result = service.predict_single(dummy_image, top_k=3)

print(f"\n🏆 Prediction: {result['top_prediction'].upper()}")
print(f"📊 Confidence: {result['confidence']:.1%}")
print(f"⚡ Latency: {result['latency_ms']} ms")
for p in result["all_predictions"]:
    bar = "█" * int(p["confidence"] * 20)
    print(f"   #{p['rank']} {p['class']:10s} {bar:<20} {p['percentage']}")

🔧 Troubleshooting

❌ Error:AttributeError: EfficientNetV2S🔍 Cause:Old TensorFlow version✅ Fix:Upgrade: pip install tensorflow>=2.10.0

❌ Error:Low accuracy despite many epochs🔍 Cause:Insufficient data✅ Fix:Use stronger augmentation, reduce model complexity, get more data

Topics Covered

#OpenCV#ResNet50#Transfer Learning#Object Detection#YOLO#EfficientNet