In this work, we present a wireless localization method that operates on self-supervised and unlabeled channel estimates. We aim to learn general-purpose channel features robust to fading and system impairments. Learned representations can be transferable to new environments and ready to use for other wireless downstream tasks. The proposed method is the first joint-embedding self-supervised approach to forsake the dependency on contrastive channel estimates. Our approach outperforms fully-supervised techniques in small data regimes under fine-tuning and, in some cases, linear evaluation. We assess the performance in centralized and distributed massive MIMO systems for multiple datasets. Moreover, our method works indoors and outdoors without additional assumptions or design changes.