Training-Free Model Merging for Multi-target Domain Adaptation

Wenyi Li*1, Huan-ang Gao*1, Mingju Gao1, Beiwen Tian1, Rong Zhi2,
Hao Zhao†1
1 Institute for AI Industry Research (AIR), Tsinghua University    
2 Mercedes-Benz Group China Ltd.
*Indicates Equal Contribution
Indicates Corresponding Author

Comparison of Domain Adaptation Settings. (a) Single Target Domain Adaptation (STDA) focuses on leveraging labeled synthetic data and unlabeled data from a single target domain together for optimal performance in that target domain. (b) Multi-target Domain Adaptation (MTDA) with data access involves utilizing data from target domains together to train a single model capable of excelling across all these domains. (c) MTDA without direct access to training data, employing model merging to enhance robustness.

Abstract

In this paper, we study multi-target domain adaptation of scene understanding models. While previous methods achieved commendable results through inter-domain consistency losses, they often assumed unrealistic simultaneous access to images from all target domains, overlooking constraints such as data transfer bandwidth limitations and data privacy concerns. Given these challenges, we pose the question: How to merge models adapted independently on distinct domains while bypassing the need for direct access to training data? Our solution to this problem involves two components, merging model parameters and merging model buffers (i.e., normalization layer statistics). For merging model parameters, empirical analyses of mode connectivity surprisingly reveal that linear merging suffices when employing the same pretrained backbone weights for adapting separate models. For merging model buffers, we model the real-world distribution with a Gaussian prior and estimate new statistics from the buffers of separately trained models. Our method is simple yet effective, achieving comparable performance with data combination training baselines, while eliminating the need for accessing training data.

Our code release is undergoing a review process within the company of our co-authors due to regulations. If you meet with problems when trying to reproduce our results or have problems with our implementation, feel free to contact us :)

Method

Overview of Two-stage Pipeline of Our Proposed Multi-target Domain Adaptation Solution. After training STDA methods on separate domains, we integrate models together using our proposed merging techniques (parameter merging + buffer merging).


Parameter Merging

Results of Git Re-Basin and Mid-Point Merging on Different Backbones. In our domain adaptation scenario, Git Re-Basin reduced to a straightforward mid-point merging approach.


Empirical Analysis for Linear Mode Connectivity. (a) Exploring the linear mode connectivity of two trained ResNet101 backbones targeted at two different domains. (b-e) Ablation studies on synthetic data, self-training architecture, initializaiton weights and pretrained weights to find the cause of the linear mode connectivity


Buffer Merging

Illustration on Merging Statistics in Batch Normalization (BN) Layers. We are provided with two sets of means and variances of data points sampled from this Gaussian prior, along with the sizes of these sets.

Results

Quantitative Results

Performance Comparison of Our Method and Baselines. The mIoU (mean Intersection-over-Union) represents the average IoU across 19 categories. 'Enc.' denotes the encoder architecture, with 'R' representing ResNet101 and 'V' indicating MiT-B5. The 'Metric' column specifies whether evaluation was conducted on the Cityscapes ('C') or IDD ('I') dataset. The harmonic mean ('H'), representing adaptation ability across the two domains, is considered as the primary metric. Bold text highlights the best harmonic mean results, while underlined text indicates the second-best results. † signifies only merging backbones while keeping separate decode heads.



Comparison of Our Method with State-of-the-Art Approaches. Prior MTDA methods used different training methods from ours, only for reference. † signifies results reproduced by us.



Application of Our Model Merging Techniques Across Four Target Domains. The datasets Cityscapes, IDD, ACDC, and DarkZurich are represented by 'C', 'I', 'A', and 'D', respectively. The mIoU of each dataset and the harmonic mean (H) is reported.




Qualitative Results

Visualization results for GTA to Cityscapes and IDD. (a) Test images from Cityscapes and IDD. We visualize results of (b) single-target domain adaptation (STDA) trained on Cityscapes target, (c) single-target domain adaptation (STDA) trained on IDD target, (d) our model merging method. (e) Ground-truth segmentation maps.

BibTeX

If you find our work useful in your research, please consider citing:
@inproceedings{li2024training,
    title={Training-Free Model Merging for Multi-target Domain Adaptation},
    author={Li, Wenyi and Gao, Huan-ang and Gao, Mingju and Tian, Beiwen and Zhi, Rong and Zhao, Hao},
    booktitle={European Conference on Computer Vision},
    year={2024},
    organization={Springer}
}