towhee.models.collaborative_experts.collaborative_experts.sharded_cross_view_inner_product(vid_embds, text_embds, text_weights, subspaces, l2renorm, ind, keep_missing_modalities, merge_caption_similiarities='avg', tol=1e-05, raw_captions=None)[source]

Compute a similarity matrix from sharded vectors.

  • embds1 (dict) – The set of sub-embeddings that, when concatenated, form the whole. The ith shard has shape B x K x F_i (i.e. they can differ in the last dimension).

  • embds2 (dict) – Same format.

  • weights2 (torch.Tensor) – Weights for the shards in embds2.

  • l2norm (bool) – Whether to l2 renormalize the full embeddings.


Similarity matrix of size BK x BK.

Return type:


NOTE: If multiple captions are provided, we can aggregate their similarities to provide a single video-text similarity score.