towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product

towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product(vid_embds, text_embds, text_weights, subspaces, l2renorm, ind, keep_missing_modalities, merge_caption_similiarities='avg', tol=1e-05, raw_captions=None)[source]

Compute a similarity matrix from sharded vectors.

Parameters:
  • (dict[str (embds2) – torch.Tensor]): the set of sub-embeddings that, when concatenated, form the whole. The ith shard has shape B x K x F_i (i.e. they can differ in the last dimension).

  • (dict[str – torch.Tensor]): same format.

  • weights2 (torch.Tensor) – weights for the shards in embds2.

  • l2norm (bool::True) – whether to l2 renormalize the full embeddings.

Returns:

similarity matrix of size BK x BK.

Return type:

(torch.tensor)

NOTE: If multiple captions are provided, we can aggregate their similarities to provide a single video-text similarity score.