Case Study: Object-centric open-vocabulary image retrieval with aggregated features

Session

Object-centric open-vocabulary image retrieval with aggregated features

Case Study

Monday, September 29

09:45 AM - 10:15 AM

Live in Berlin

Less Details

The task of open-vocabulary object-centric image retrieval involves the retrieval of images containing a specified object of interest, delineated by an open-set text query. As working on large image datasets becomes standard, solving this task efficiently has gained significant practical importance. Applications include targeted performance analysis of retrieved images using ad-hoc queries and hard example mining during training. Recent advancements in contrastive-based open vocabulary systems have yielded remarkable breakthroughs, facilitating large-scale open vocabulary image retrieval. However, these approaches use a single global embedding per image, thereby constraining the system’s ability to retrieve images containing relatively small object instances. Alternatively, incorporating local embeddings from detection pipelines faces scalability challenges, making it unsuitable for retrieval from large databases.

In this work, we present a simple yet effective approach to object-centric open-vocabulary image retrieval. Our approach aggregates dense embeddings extracted from CLIP into a compact representation, essentially combining the scalability of image retrieval pipelines with the object identification capabilities of dense detection methods. We show the effectiveness of our scheme to the task by achieving significantly better results than global feature approaches on three datasets, increasing accuracy by up to 15 mAP points. We further integrate our scheme into a large scale retrieval framework and demonstrate our method’s advantages in terms of scalability and interpretability.

In this session, you will learn more about:

The challenges and motivation for open-vocabulary object-centric image retrieval when dealing with large automative image databases
How to effectively index images for retrieval by aggregating dense embeddings extracted from CLIP image encodings
How to use the proposed pipeline for large scale rare object retrieval

Get Agenda

Book Now

Name	Borlabs Cookie
Provider	Owner of this website, Imprint
Purpose	Saves the visitors preferences selected in the Cookie Box of Borlabs Cookie.
Cookie Name	borlabs-cookie
Cookie Expiry	1 Year

Name	Google Tag Manager
Provider	Google Ireland Limited, Gordon House, Barrow Street, Dublin 4, Ireland
Purpose	Cookie by Google used to control advanced script and event handling.
Privacy Policy	https://policies.google.com/privacy?hl=en
Cookie Name	_ga,_gat,_gid
Cookie Expiry	2 Years

Name	Polylang
Provider	Owner of the website
Purpose	Saves information about the selected language setting.
Privacy Policy	https://polylang.pro/privacy-policy/
Cookie Name	pll_language
Cookie Expiry	1 Jahr

Name	Wordpress (Logged in)
Provider	Owner of this website
Purpose	Wordpress cookie which is used to keep users logged in.
Privacy Policy	https://wordpress.org/about/privacy/
Cookie Name	wordpress_logged_in_*

Name	Wordpress (Security)
Provider	Owner of this website
Purpose	Wordpress cookie, which is saved to protect against hackers.
Privacy Policy	https://wordpress.org/about/privacy/
Cookie Name	wordpress_sec_*

Accept	Zlc - Cookies
Name	Zlc - Cookies
Provider	Owner of the website
Purpose	The cookie enables you to continue a chat while you are browsing different pages on our website or when you return to the website later.
Privacy Policy	https://www.zendesk.com/company/customers-partners/privacy-policy/
Cookie Name	__zlcmid, __zlcprivacy

Accept	Mail Tracking
Name	Mail Tracking
Provider	Owner of the website
Purpose	Stores information about whether a user clicked the link to the website via an email sent by we-conect.
Privacy Policy	https://privacy.we-conect.com/privacy-policy/
Cookie Name	trakingToken