towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product¶
- towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product(vid_embds, text_embds, text_weights, subspaces, l2renorm, ind, keep_missing_modalities, merge_caption_similiarities='avg', tol=1e-05, raw_captions=None)[source]¶
Compute a similarity matrix from sharded vectors.
- Parameters:
embds1 (dict) – The set of sub-embeddings that, when concatenated, form the whole. The ith shard has shape B x K x F_i (i.e. they can differ in the last dimension).
embds2 (dict) – Same format.
weights2 (torch.Tensor) – Weights for the shards in embds2.
l2norm (bool) – Whether to l2 renormalize the full embeddings.
- Returns:
Similarity matrix of size BK x BK.
- Return type:
(torch.Tensor)
NOTE: If multiple captions are provided, we can aggregate their similarities to provide a single video-text similarity score.