Back to blog

Using Data Science to Predict Video Popularity | Winner’s Interview with Kazeem Hakeem

Winner Interview

Using Data Science to Predict Video Popularity | Winner’s Interview with Kazeem Hakeem

Insights from our Data Science Competition Winner

Photo by Szabo Viktor on Unsplash

Our Video Popularity Prediction Challenge recently came to a close, so we reached out to the 3rd place winner Kazeem Hakeem. Here are his responses.

Meet Kazeem Hakeem!

Please introduce yourself, including your name and academic/professional background.

My name is Kazeem Hakeem, I have a strong engineering professional with a Master of Science in Applied Mathematics from University of Lagos.

Are you currently working as a data scientist?

Yes, at Crown Interactive Limited.

Why did you decide to join this Video Popularity Prediction challenge?

A friend told me about it.

Have you ever participated in a competition like this before?

Yes.

Let’s get Technical

What was your impression of the dataset and problem statement for this competition?

Having to see 4 different dataset was intimidating as I did not know how to start but the problem statement was clear enough. The description in the about section and the guidelines also helped a lot.

Please explain your winning solution and the process you used to build it.

My winning solution was actually basic, there weren’t any feature engineering made which still surprises me. I used a bayesian optimization for hyperparameter tuning using a LightGBM model. The training and prediction was done using the Out Of Fold technique (10 Folds).

Why do you think you were able to create a winning solution?

I think I’ve built a lot of model that has overfitted in the past and decided to take the advice of Zen of python’s Simple is better than complex.

Do you have a standard step-by-step approach that you use in data science competitions like this one? If so, could you share it?

Yes, I mostly start with bayesian optimization hyperparameter tuning (XGBoost and LightGBM) on the data set (which is probably a bad idea but I like being in the top 20% of a competition with my first 2 submissions). Then from there on, I used the model which gives me the best result and begin performing feature engineering and selection, along with another hyperparameter tuning.

Did you face any problems or difficulties in this challenge? Please explain.

Yes, as stated earlier, there was a lot of data but the guidelines helped in solving that.

Is there anything you would do differently if you could start over on this challenge?

Yes, feature extraction from the other dataset would definitely help.

What did you think about this competition overall?

It was erratic, my offline CV score did not improve my public leaderboard score.

Words of wisdom

Did you learn anything new by participating in this challenge? If so, what?

Yes, finding the best set of data/feature is better than using all. And I think I still need to learn the art of feature selection which is one of the important things to do when building a model.

Do you have any advice for newbies looking to get started on a machine learning challenge like this?

Learn and code daily, follow Data Science/ML professionals on twitter/LinkedIn, do a lot of Machine learning competition. Learn from Kaggle notebooks(even though this might be overwhelming most times) and find a study/competition buddy(they help out when you loose your motivation)

That’s all the responses by our winners.

Good news! We have a new competition released — the Viral Tweets Prediction Challenge. Interested in predicting viral tweets with data science? This competition ends on July 6, 2021 so sign up now and test your skills today!

Follow our socials to stay updated!