Just out of interest in the algorithm... my guess is:
- for each possible overlap of images
-- see if all non-set pixels in the map match the screenshot
-- if so, count how many pixels matched
- choose the one that matched the most pixels
There's a few reasons why I didn't go with that:
1. As the map gets bigger, the number of possible overlap positions gets extremely large (although at least it's proportional to the map area rather than any higher power)
2. It can't handle extremely repetitive images, eg. a screen full of the same repeating block
3. It can't handle sprites, or background animation, or parallax, at all
4. I was writing it in Delphi which isn't super-fast, and I liked making it have a GUI and be watchable even if that slows it by a fraction
5. I was working on the principle of having high-frame-rate screengrabs as the source data the whole time
Possible caveats to these:
- You could use some smoothing/subsampling to rapidly find suitable stitch locations, somewhat like the way photo stitching works, but that sounds quite tricky
- You could refuse to stitch images in places where all the pixels are known, to reduce the search space
- If #3 is acceptable, the "discard location" threshold is just 1 non-matching pixel, which is much faster than mine
The sample looks really good. Can you say how many source images it took?
I found that implementing a mask image (applied to input images, to get rid of the player sprite/status bars/etc) made a surprisingly big difference. I recommend you try it.