👁️
Module 4
Computer VisionImage Processing & Detection
⏱️ 10–14 hours📊 Advanced🧩 2 Code Blocks🏗️ 1 Project
🎯 Learning Objectives
- ✓Understand how computers "see" images — from pixels to features to objects.
- ✓Learn convolutional operations, transfer learning, and object detection — with visual explanations.
- ✓Use OpenCV for classical image processing and TensorFlow for deep learning on images.
- ✓Build a complete Real-Time Object Detection & Classification System.
📋 Prerequisites
Module 1, 2 & 3 completedComfortable with neural networks from Module 2You've come a long way — this is the final module!
📐 Technical Theory
2D Convolution — The Math
A 2D convolution slides a kernel K over an input I. Each filter learns to detect a specific pattern (edge, curve, texture). Stride controls how many pixels the filter moves; padding can maintain spatial dimensions.
(I * K)[i,j] = Σₘ Σₙ I[i+m, j+n] · K[m, n] output_size = ⌊(input_size - kernel_size + 2p) / s⌋ + 1
What Each Layer Learns
CNN feature maps reveal a beautiful hierarchy:
• Early layers (1-2): Horizontal/vertical edges, color gradients
• Middle layers (3-5): Textures — grids, corners, circles
• Deep layers (6+): High-level semantics — faces, wheels, eyes
This hierarchical composition is why CNNs are so powerful for vision.
Transfer Learning — Standing on Giants' Shoulders
Training a ResNet-50 from scratch requires ~14 million labeled images and weeks of compute. Transfer learning instead:
1. Start with a pre-trained model (ImageNet weights)
2. Freeze the base convolutional layers
3. Replace the final classification layer
4. Fine-tune on your small domain-specific dataset
This works because low-level features are universal.
ResNet — Skip Connections
Very deep networks suffer from vanishing gradients. The Residual Connection creates a highway for gradients: output = F(x) + x
Instead of learning H(x) directly, the layers learn the residual F(x) = H(x) - x. If identity mapping is optimal, F(x) → 0. This enables training networks 1000+ layers deep.
output = F(x) + x
Object Detection Methods
YOLO (You Only Look Once) divides the image into a grid and predicts bounding boxes and class probabilities in a single forward pass — enabling real-time detection.
| Task | Output | Example Method |
|---|---|---|
| Classification | Class label | ResNet, VGG |
| Localization | Class + bounding box | Two-head architecture |
| Object Detection | Multiple classes + boxes | YOLO, Faster R-CNN, SSD |
| Segmentation | Per-pixel class | U-Net, Mask R-CNN |
💻 Code Implementation
Step 1: Image Processing with OpenCV
Step 1: Image Processing with OpenCV
python
import cv2
import numpy as np
import matplotlib.pyplot as plt
def load_sample_image() -> np.ndarray:
"""Generate a synthetic test image."""
img = np.zeros((400, 600, 3), dtype=np.uint8)
cv2.rectangle(img, (50, 50), (200, 200), (0, 100, 255), -1)
cv2.circle(img, (400, 200), 100, (0, 255, 100), -1)
cv2.putText(img, "OpenCV Demo", (100, 350),
cv2.FONT_HERSHEY_DUPLEX, 1.5, (255, 255, 255), 2)
return img
img_bgr = load_sample_image()
img_rgb = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2RGB)
img_gray = cv2.cvtColor(img_bgr, cv2.COLOR_BGR2GRAY)
# ── Processing Operations ─────────────────────────────────
blurred = cv2.GaussianBlur(img_gray, (15, 15), sigmaX=3)
edges = cv2.Canny(img_gray, threshold1=50, threshold2=150)
sobel_x = cv2.Sobel(img_gray, cv2.CV_64F, dx=1, dy=0, ksize=3)
sobel_y = cv2.Sobel(img_gray, cv2.CV_64F, dx=0, dy=1, ksize=3)
sobel_mag = np.uint8(np.clip(
np.sqrt(sobel_x**2 + sobel_y**2) / np.sqrt(sobel_x**2 + sobel_y**2).max() * 255, 0, 255
))
kernel = np.ones((5, 5), np.uint8)
dilated = cv2.dilate(edges, kernel, iterations=2)
eroded = cv2.erode(dilated, kernel, iterations=1)
# ── Visualization ─────────────────────────────────────────
fig, axes = plt.subplots(2, 3, figsize=(18, 10))
images = [
(img_rgb, "Original RGB", None),
(img_gray, "Grayscale", "gray"),
(blurred, "Gaussian Blur", "gray"),
(edges, "Canny Edges", "gray"),
(sobel_mag, "Sobel Magnitude", "gray"),
(eroded, "Morphology", "gray"),
]
for ax, (image, title, cmap) in zip(axes.flat, images):
ax.imshow(image, cmap=cmap)
ax.set_title(title, fontsize=13, fontweight="bold")
ax.axis("off")
plt.suptitle("Classical Computer Vision — OpenCV", fontsize=16, y=1.01)
plt.tight_layout()
plt.savefig("cv_operations.png", dpi=150, bbox_inches="tight")
plt.show()🔧 Troubleshooting
❌ Error:cv2.error: (-215) !_src.empty()🔍 Cause:Image file not found or path wrong✅ Fix:Use os.path.exists() check; use raw strings r"C:\path" on Windows
❌ Error:ModuleNotFoundError: No module named "cv2"🔍 Cause:OpenCV not installed✅ Fix:pip install opencv-python
Step 2: Transfer Learning with ResNet50
Step 2: Transfer Learning with ResNet50
python
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, applications, callbacks
IMG_SIZE = (224, 224)
BATCH_SIZE = 32
NUM_CLASSES = 5
def build_transfer_model(num_classes, base_trainable=False):
"""Two-phase transfer learning with ResNet50."""
base_model = applications.ResNet50(
weights="imagenet", include_top=False,
input_shape=(*IMG_SIZE, 3)
)
base_model.trainable = base_trainable
if base_trainable:
for layer in base_model.layers[:-30]:
layer.trainable = False
inputs = keras.Input(shape=(*IMG_SIZE, 3))
x = base_model(inputs, training=base_trainable)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(512, activation="relu")(x)
x = layers.BatchNormalization()(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(256, activation="relu")(x)
x = layers.Dropout(0.3)(x)
outputs = layers.Dense(num_classes, activation="softmax")(x)
model = keras.Model(inputs, outputs, name="resnet50_transfer")
lr = 1e-3 if not base_trainable else 1e-5
model.compile(
optimizer=keras.optimizers.Adam(lr),
loss="categorical_crossentropy",
metrics=["accuracy"]
)
return model
# ── Build ─────────────────────────────────────────────────
phase1_model = build_transfer_model(NUM_CLASSES, base_trainable=False)
phase1_model.summary()
print("\n✅ ResNet50 Transfer Learning ready.")
print(" Provide a directory with class subfolders to train.")
print(" data/train/class_1/*.jpg, data/train/class_2/*.jpg, ...")🔧 Troubleshooting
❌ Error:ResourceExhaustedError during CNN training🔍 Cause:GPU OOM with large images✅ Fix:Reduce IMG_SIZE, use batch_size=8, enable mixed precision
❌ Error:ValueError: Input 0 is incompatible🔍 Cause:Image shape mismatch✅ Fix:Ensure images are resized to (224, 224, 3)
🏗️ Practical Project
End-to-End Image Classification System
A production-ready image classifier using EfficientNetV2S with a multi-dropout head, inference-optimized pipeline, and batch processing support.
End-to-End Image Classification System
python
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import applications, layers
from PIL import Image
import io, time
CLASSES = ["cats", "dogs", "birds", "cars", "flowers"]
IMG_SIZE = (224, 224)
def build_production_classifier(num_classes):
base = applications.EfficientNetV2S(
weights="imagenet", include_top=False,
input_shape=(*IMG_SIZE, 3),
)
base.trainable = False
inputs = keras.Input(shape=(*IMG_SIZE, 3), name="image_input")
x = applications.efficientnet_v2.preprocess_input(inputs)
x = base(x, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(512, activation="relu")(x)
x = layers.Dropout(0.4)(x)
x = layers.Dense(num_classes, activation="softmax", name="predictions")(x)
model = keras.Model(inputs, x, name="production_classifier")
model.compile(
optimizer=keras.optimizers.Adam(1e-3),
loss="categorical_crossentropy", metrics=["accuracy"]
)
return model
def preprocess_image(image_input) -> np.ndarray:
if isinstance(image_input, str):
img = Image.open(image_input).convert("RGB")
elif isinstance(image_input, bytes):
img = Image.open(io.BytesIO(image_input)).convert("RGB")
elif isinstance(image_input, np.ndarray):
img = Image.fromarray(image_input.astype(np.uint8)).convert("RGB")
else:
img = image_input.convert("RGB")
img = img.resize(IMG_SIZE, Image.LANCZOS)
return np.expand_dims(np.array(img, dtype=np.float32), axis=0)
class ImageClassificationService:
def __init__(self, model, class_names):
self.model = model
self.class_names = class_names
self._warmup()
def _warmup(self):
dummy = np.zeros((1, *IMG_SIZE, 3), dtype=np.float32)
_ = self.model.predict(dummy, verbose=0)
print("✅ Model warmed up — ready for inference.")
def predict_single(self, image_input, top_k=3):
start = time.perf_counter()
arr = preprocess_image(image_input)
probs = self.model.predict(arr, verbose=0)[0]
elapsed = (time.perf_counter() - start) * 1000
top_indices = np.argsort(probs)[::-1][:top_k]
predictions = [{
"rank": i+1,
"class": self.class_names[idx],
"confidence": float(probs[idx]),
"percentage": f"{probs[idx]:.1%}",
} for i, idx in enumerate(top_indices)]
return {
"top_prediction": predictions[0]["class"],
"confidence": predictions[0]["confidence"],
"all_predictions": predictions,
"latency_ms": round(elapsed, 2),
}
# ── Initialize & Demo ────────────────────────────────────
model = build_production_classifier(len(CLASSES))
service = ImageClassificationService(model, CLASSES)
dummy_image = np.random.randint(0, 255, (*IMG_SIZE, 3), dtype=np.uint8)
result = service.predict_single(dummy_image, top_k=3)
print(f"\n🏆 Prediction: {result['top_prediction'].upper()}")
print(f"📊 Confidence: {result['confidence']:.1%}")
print(f"⚡ Latency: {result['latency_ms']} ms")
for p in result["all_predictions"]:
bar = "█" * int(p["confidence"] * 20)
print(f" #{p['rank']} {p['class']:10s} {bar:<20} {p['percentage']}")🔧 Troubleshooting
❌ Error:AttributeError: EfficientNetV2S🔍 Cause:Old TensorFlow version✅ Fix:Upgrade: pip install tensorflow>=2.10.0
❌ Error:Low accuracy despite many epochs🔍 Cause:Insufficient data✅ Fix:Use stronger augmentation, reduce model complexity, get more data
Topics Covered
#OpenCV#ResNet50#Transfer Learning#Object Detection#YOLO#EfficientNet