Back to blog

Why Data Scientists Should Get Involved with AI Competitions

Circuit to success

Looking back at Atrae and bitgrit’s “#SwipeToSuccess” Competition.

Reflections and Insights

Last year, we hosted the “#SwipeToSuccess” competition in collaboration with Atrae, just as they were introducing their networking application, “Yenta”, into the Indian market. Data scientists from around the globe battled it out, trying to come up with algorithms that would improve the accuracy of Yenta’s matches.

In this article, we’re going to reflect on how the competition went, and go into some of the details of the process, hopefully providing you with some valuable insights.

Table of contents:
- What are the advantages of data science competitions? 
- How is a competition organised?
- Closing notes by Mr. Sugiyama. (Data scientist at Atrae)

What are the advantages of DS competitions?

Bitgrit #SwipeToSuccess competition page

Compared to the conventional model of outsourcing AI solutions to a handful of engineers, there are numerous advantages of hosting AI competitions instead.

For one, they allow companies who lack the necessary resources and internal talent to fully utilize their data and to improve their businesses efficiently. A community of over 30,000 talented data scientists from around the globe actively compete on the bitgrit platform, creating a range of unique algorithms to obtain models of high accuracy and offer expert technical solutions.

With the arrival of big data leading to a highly competitive market, the accuracy of these AI models can be a crucial factor in determining business survival and progress.

In addition, companies hoping to expand and go global with their services and products could follow Atrae’s example of advertising their brand by hosting a competition and placing their company firmly in the field of data science.

The aim of #SwipeToSuccess

The aim of this competition was to improve the AI algorithm for the business networking application “Yenta”, which was being introduced into the Indian market. Contestants on bitgrit were provided with real data sourced from Atrae itself to work on.

As a result, skilled data scientists from all over the world, including Japan, India and America, vied for the top spot as development of the algorithm progressed rapidly. As you can see, these competitions provide the unique opportunity to harness a huge store of collective knowledge and skills to help develop technically complex, precise models for data.

Competitions aren’t just about encouraging freelance data scientists to compete; it can also be a great way to recruit talent into companies through interviews/contact with the winners after the competition ends.

Bitgrit events in India: NIT — Jalandhar in Punjab, India (top), Jawaharlal Nehru University in Delhi, India (bottom)

How is a Competition Organized?

This competition ran from the 24th of August 2020 until the 31st of October that year.

We discussed the competition details with Atrae’s in-house team of data scientists, figuring out what data we could use and what challenge would draw in bitgrit users. After much deliberation, we decided to see if contestants could improve on the matching algorithm that predicts the compatibility between the users of Yenta.

Defining The Goal

We were looking for the algorithm that could first; predict if two users would match or not, and second; predict if they would arrange to meet up. So, we decided that the best way to judge this second point was by checking if they had left a review after matching.

Then sorted user compatibility into 4 cases:

(0) — User A and User B didn’t match.
(1) — User A and User B matched but didn’t meet.
(2) — The users matched but left negative reviews.
(3) — The users matched and left positive reviews.

With the challenge for the competition set, we then prepared the data set using the following processes.

Preparing the data set

First, we arranged user data into two types, personal data (education, skills, profile information, etc), and mutual data (past swipes, reviews, etc).

We then took privacy and the usefulness of data into account, omitting all unneeded/sensitive information. Lastly, we tested the quality of the data set by running it through our own user compatibility algorithm models.

It was important for contestants to pick out what they needed from the data files, which was a challenge in itself, as there was a vast amount of different data types available to use.

We found later that something every prize winner had in common was that they spent a lot of their time on “feature engineering” before getting to work on the algorithm itself. Participants had to come up with creative and effective features (variables) in order to improve their algorithms and create the higher quality models.

Competition result

After 2 months, the competition came to an end and the prize money was handed out to the top 4 contestants. Nikhil, a data scientist from India came first with a score of 87.0207%.

Though all the winners created their solutions using LightGBM, the highest scores paid extra attention to the feature engineering step in their algorithm creation process.

Photo of the winner, data scientist Nikhil. When asked about the competition, he told us that “I feel that the quality of the data contributed to the results, and I really enjoyed the process of creating my model.”

1st place winner Nikhil explained that “I spent a lot of my time trying out all sorts of feature combinations to get to grips with the data, as there was so much variation”.

He also added that he felt the difference in age of users and the number of times that a user swipes right had the most significant impact on the predictability of results. In the post-competition interviews, Nikhil was also kind enough to add, “I’m pleased that I was able to provide some value to Atrae and the users of Yenta.”

Similarly, Kaggle grandmaster rank user, Senkin told us that unlike competitions run by other AI companies, which provided only basic graph features, the #SwipeToSuccess competition had provided an incredibly resourceful set of data and he was able to learn a lot from it.

Competitions set up in this way are a great opportunity to gain ideas from data scientists worldwide and aid in the development of high-performance algorithms, as shown by this rundown.

We would like to finish the article by including some final thoughts by Mr. Sugiyama, who works as a data scientist at Atrae.

Closing notes by Mr. Sugiyama. (Atrae DS)

Atrae data scientists and engineers

The models that ranked highly were all truly amazing. The results made it clear that there were some world-class data scientists competing, pushing themselves and going through a lot of trial and error to refine their algorithms.

We received a wide variety of models, ranging from those that mainly focused on gradient boosting decision trees/feature engineering, to those incorporating GCN. Judging by the level and variety of the models we received, I feel that we couldn’t have been able to achieve these results by ourselves.

The interviews with the winners were also incredibly insightful. Though many feel that these contests are a game of gaining small increases in accuracy, something I felt as I was speaking with these amazingly gifted data scientists, is that there was a clear passion for data and a drive to solve complex problems, made only possible through hard work and countless trial and error.

None of the winners seemed to be concerned about the prize money, for them it was about their analysis providing tangible real value for society instead. I left knowing that the best of us are those with an extremely driven mindset and a need to create value.

In closing, I would like to thank DataGateway for their management of this competition, which was faultless. We were able to have in-depth discussions of our expectations and targets for the competition and arrived at something we were all happy with.

Though we created a classification system to judge the results, we were very surprised as to how accurate it predicted the ranking of contestants (where models able to accomplish (0) or (1) would place on the higher end, and those that managed (2) or (3) would determine top spots).

We are also very grateful for them conducting the data sort and preprocessing without any leaks or problems. To do this all ourselves would have been extremely difficult and we were able to rely on them completely throughout our collaboration together.


For the latest information, please check our website.

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit’s socials 📱 to stay updated on workshops and upcoming competitions!