Publications - Current Year

2026

  1. “Align Once to Explain: Feature Alignment for Scalable B-cosification of Foundational Vision Transformers,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2026), Denver, CO, USA, 2026.
  2. “Boosting Segment Anything Model to Generalize,” IEEE Transactions on Image Processing, vol. 35, 2026.
  3. “Sports-QA: A Large-Scale Video Question Answering Benchmark for Complex and Professional Sports,” International Journal of Computer Vision, vol. 134, 2026.
  4. “Amplitude Exchanging Network for Unsupervised Underwater Image Enhancement,” Pattern Recognition, vol. 175, 2026.
  5. “GeoDiv: Framework for Measuring Geographical Diversity in Text-to-Image Models,” in The Fourteenth International Conference on Learning Representations (ICLR 2026), Rio de Janeiro, Brazil, 2026.
  6. “Certified Circuits: Stability Guarantees for Mechanistic Circuits,” 2026. [Online]. Available: https://arxiv.org/abs/2602.22968.
  7. “Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance,” 2026. [Online]. Available: https://arxiv.org/abs/2605.31304.
  8. “More Images, More Problems? A Controlled Analysis of VLM Failure Modes,” 2026. [Online]. Available: https://arxiv.org/abs/2601.07812.
  9. “Do Instance Priors Help Weakly Supervised Semantic Segmentation?,” 2026. [Online]. Available: https://arxiv.org/abs/2604.11170.
  10. “Rewis3d: Reconstruction Improves Weakly-Supervised Semantic Segmentation,” 2026. [Online]. Available: https://arxiv.org/abs/2603.06374.
  11. “RAWDet-7: A Multi-Scenario Benchmark for Object Detection and Description on Quantized RAW Images,” 2026. [Online]. Available: https://arxiv.org/abs/2602.03760.
  12. “What is Missing? Explaining Neurons Activated by Absent Concepts,” 2026. [Online]. Available: https://arxiv.org/abs/2603.09787.
  13. “What Matters for Scalable and Robust Learning in End-to-End Driving Planners?,” 2026. [Online]. Available: https://arxiv.org/abs/2603.15185.
  14. “PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding,” 2026. [Online]. Available: https://arxiv.org/abs/2605.30126.
  15. “ClipTTT: CLIP-Guided Test-Time Training Helps LVLMs See Better,” 2026. [Online]. Available: https://arxiv.org/abs/2603.26486.
  16. “From Codebooks to VLMs: Evaluating Automated Visual Discourse Analysis for Climate Change on Social Media,” 2026. [Online]. Available: https://arxiv.org/abs/2604.21786.
  17. “MM-TS: Multi-Modal Temperature and Margin Schedules for Contrastive Learning with Long-Tail Data,” 2026. [Online]. Available: https://arxiv.org/abs/2603.08202.
  18. “Insight: Interpretable Semantic Hierarchies in Vision-Language Encoders,” 2026. [Online]. Available: https://arxiv.org/abs/2601.13798.
  19. “DAVE: Distribution-aware Attribution via ViT Gradient Decomposition,” 2026. [Online]. Available: https://arxiv.org/abs/2602.06613.
  20. “SSL-R1: Self-Supervised Visual Reinforcement Post-Training for Multimodal Large Language Models,” 2026. [Online]. Available: https://arxiv.org/abs/2604.20705.
  21. “R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs,” 2026. [Online]. Available: https://arxiv.org/abs/2604.20696.
  22. “ClimateVID -- Social Media Videos Analysis and Challenges Involved,” 2026. .
  23. “Seeing Through Circuits: Faithful Mechanistic Interpretability for Vision Transformers,” 2026. [Online]. Available: https://arxiv.org/abs/2604.14477.

2025

  1. “y-Quant: Towards Learnable Quantization for Low-bit Pattern Recognition,” in Pattern Recognition (DAGM GCPR 2025), Freiburg, Germany, 2026.
  2. “MT-Occ: Single-View 3D Occupancy Prediction via Multi-task Distillation,” in Pattern Recognition (DAGM GCPR 2025), Freiburg, Germany, 2026.