towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product

towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product(vid_embds, text_embds, text_weights, subspaces, l2renorm, ind, keep_missing_modalities, merge_caption_similiarities='avg', tol=1e-05, raw_captions=None)[source]

Compute a similarity matrix from sharded vectors.

Parameters:
  • embds1 (dict) – The set of sub-embeddings that, when concatenated, form the whole. The ith shard has shape B x K x F_i (i.e. they can differ in the last dimension).

  • embds2 (dict) – Same format.

  • weights2 (torch.Tensor) – Weights for the shards in embds2.

  • l2norm (bool) – Whether to l2 renormalize the full embeddings.

Returns:

Similarity matrix of size BK x BK.

Return type:

(torch.Tensor)

NOTE: If multiple captions are provided, we can aggregate their similarities to provide a single video-text similarity score.