A lot of our research and development activities rely on large collections of web media content sourced from social media platforms, such as YouTube and Twitter, and then manually curated and annotated by our researchers with the purpose of creating “ground truth” datasets. This helps us train machine learning models on specific tasks and then benchmark those models along with competing approaches in order to select the best method per case. It goes without saying that we spend loads of effort on developing scripts for crawling, monitoring, extracting and fetching the necessary data and content related to the target task, and then even more effort on curating, cleaning and labeling (aka annotating) the collected datasets. Especially, content labelling is particularly challenging due to the subjective nature of the task, e.g. different people may perceive the same content as belonging to different categories, while there are additional issues in specific annotation tasks, e.g. when dealing with NSFW and disturbing content.