towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product¶
- towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product(vid_embds, text_embds, text_weights, subspaces, l2renorm, ind, keep_missing_modalities, merge_caption_similiarities='avg', tol=1e-05, raw_captions=None)[source]¶
Compute a similarity matrix from sharded vectors.
- Parameters:
(dict[str (embds2) – torch.Tensor]): the set of sub-embeddings that, when concatenated, form the whole. The ith shard has shape B x K x F_i (i.e. they can differ in the last dimension).
(dict[str – torch.Tensor]): same format.
weights2 (torch.Tensor) – weights for the shards in embds2.
l2norm (bool::True) – whether to l2 renormalize the full embeddings.
- Returns:
similarity matrix of size BK x BK.
- Return type:
(torch.tensor)
NOTE: If multiple captions are provided, we can aggregate their similarities to provide a single video-text similarity score.