Machine Learning Tools Used By The Kaggle Experts – Analytics India Magazine

There isnt a dearth of ML tools today. However, for a beginner, to know about the tool stack of those who win Kaggle competitions consistently is of great help. One can later go ahead and pick the tool of their choice. In the next section, we look at the top tools, frameworks, cloud services, libraries used by the Kaggle masters and Grand Masters, which they revealed to us in our exclusive interviews. That said, we have to admit that all these top Kagglers are of the opinion that one should not fall in love with tools, and it is all right as long any tools get the job done right!

4x Kaggle GM, Abhishek Thakur says that he frequently finds himself using TensorFlow for NLP problems and PyTorch for computer vision problems.

When it comes to favourite Python libraries, Thakur is in praise for Scikit-learn and how significant this library is in providing many necessary components to put a model into production.

Thakur, however, believes that there isnt a shortage of libraries or frameworks one can use these days, and its all good as long as one understands what is happening in the background.

Arthur says that a basic laptop would sometimes suffice. However, sometimes he rents some GPUs of Google cloud platform with Kaggle vouchers, depending on the competition.

Here is what Arthurs toolkit looks like:

A Kaggle master ranked in the top 20 in the competitions leaderboard, Mathurin says that he prefers Python to R, though he had been using R until 2015. Mathurin who has been in this field for over a decade and a half, his renewed interest in algorithms made him switch to Python gradually.

A look at Mathurins toolkit, which he keeps coming back to:

Duc, who is ranked in the world top 50 and also a chief data engineer and co-founder of the Vietnamese AI startup, Palexy, says that he and his team usually use one server with 2x1080Ti with a Kaggle kernel. For a competition like DeepFake, he prefers renting a server with 4x1080Ti on AWS.

Talking about frequently used tools, Duc said that he usually finds himself using Keras-TensorFlow, OpenCV, albumentation, lgbm, scikit-learn. A data engineer by profession, Duc says that the role of a data engineer is collecting data and preparing the data pipeline, and for a data engineering team to build the necessary infrastructure and architecture for data generation, they use SQL, MySQL, Spark, Hadoop, Hive, etc.

Whereas, in case of a data scientist who is responsible for obtaining insights from data and formulating these insights into a model to communicate with the clients, data scientists use statistics, visualisation (matplotlib, seaborn), modeling (sklearn, TensorFlow, PyTorch), etc

An AI engineer and a grandmaster, Darragh usually runs code off the command line and Spyder IDE and mainly leverages AWS and prototypes on his Macbook Pro, which he believes, is enough to check if a pipeline is working well before deploying. Regarding the frameworks, Darragh has expressed his liking for PyTorch over other frameworks for the kind of freedom it offers to experiment compared to others.

comments

Read this article:
Machine Learning Tools Used By The Kaggle Experts - Analytics India Magazine

Related Posts

Comments are closed.