so like many other weekends, i spent the last 2 days scratching my brain around an interesting problem - comparing two popular deep learning architectures, ResNet50 and ResNeXt50, on the CIFAR10 dataset.
The Setup
for those who don't know, ResNet introduced residual connections that helped train much deeper networks by solving the vanishing gradient problem. ResNeXt came later and introduced the concept of "cardinality" - basically parallel paths within the same block.
The Experiment
i set up a simple experiment:
- same dataset (CIFAR10)
- same training parameters
- same number of epochs
- only difference: architecture (ResNet50 vs ResNeXt50)
Initial Thoughts
honestly, i thought ResNeXt50 would perform better. why? because it's newer and has that fancy cardinality concept. but deep learning often surprises you.
The Code Setup
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
# Data preprocessing
transform = transforms.Compose([
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
shuffle=True, num_workers=2)Training Loop
nothing fancy here, just standard training:
def train_model(model, criterion, optimizer, epochs=50):
for epoch in range(epochs):
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
inputs, labels = data
optimizer.zero_grad()
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()Results?
well, this is where it gets interesting. after 50 epochs:
- ResNet50: 92.3% accuracy
- ResNeXt50: 91.8% accuracy
surprised? i was. ResNet50 actually performed slightly better.
Why Though?
i think it comes down to:
- Dataset size: CIFAR10 is relatively small, ResNeXt might be overkill
- Regularization: ResNet's simpler architecture might generalize better
- Training time: ResNeXt took about 30% longer to train
The Real Learning
but here's what i actually learned:
- newer ≠ better always
- architecture choice depends on your specific use case
- sometimes simpler models work better on smaller datasets
Next Steps
thinking of trying:
- Different datasets (ImageNet maybe?)
- Different cardinality values for ResNeXt
- Maybe some ensemble methods
Conclusion
was this experiment groundbreaking? no. but it taught me to question assumptions and actually test things rather than just going with what's newer or more complex.
sometimes the old ways are the best ways. or at least good enough.
code available on github if anyone wants to reproduce or extend this experiment.