Are you trying to from a specific video-text retrieval model, or
Ground truth or processed metadata for datasets like TVR , ActivityNet , or Charades-STA . SL_Hard_zip
Specifically curated data points used to train models to distinguish between very similar but incorrect matches (crucial for "Hard" contrastive learning). Are you trying to from a specific video-text