I am experimenting with different optimizers and learning rate schedulers to improve the performance of a MaskRCNN model (40K images input, instance segmentation, 1 class). The baseline I am comparing with is the:
PURPLE LINE which uses SGD StepLR(step_size=3, gamma=0.1)
My two attempts at improving is are the
YELLOW LINE AdamW(lr=5e-5) ReduceLROnPlateau(patience=3, factor=0.75
BLUE LINE AdamW(lr=5e-5) StepLR(step_size=3, gamma=0.1) 2x number of input images
While I'm excited to see the metrics go above the purple line, I'm very interested to show what is happening with the AP@ IoU 0.5 which start to wiggle and drop, where as AP@ IoU 0.75 seems more stable.
Could someone help me understand why this is happening and what it means?