Workshop: Data Science at Readdle

Boris Tarovik and Ivan Budnikov

Boris Tarovik, RnD Engineer at Readdle
Ivan Budnikov, Machine Learning Engineer at Readdle

June 23

from 10:00 am till 7:00 pm

Creative Quarter

1A Sportyvna Square, Gulliver

Facebook

Workshop: Data Science at Readdle – is a Data Science UA and Readdle project, in which we, together with Ivan Budnikov and Boris Tarovik, will consider aspects of the Data Scientist’s work, the life cycle of the ML project, the basic ML algorithms, examples of neural networks in production of such companies as Google, Readdle, Prisma and more!

Participants will be able to create simplest ML solution for house price estimating, using sklearn/numpy and a neural network to solve computer vision problem, using tensorflow

Participation in the workshop is free for pre-registration (you must receive a letter confirming the registration to the event)

Software requirements:

  • Python3, numpy, sklearn, tensorflow, skimage

For whom the course was developed:

  • people interesting in Data Science and Machine Learning
  • students of computer science/math departments
  • developers/engineers/QA

Speakers

Boris Tarovik

RnD Engineer, Readdle

Ivan Budnikov

Machine Learning Engineer, Readdle

Course program

Introduction

  • Data Science, Big Data, Machine Learning — what does it mean?
  • Machine Learning vs usual algorithms — what’s the difference?
  • Types of ML — supervised, reinforced, unsupervised

Knowledge you need to have to become Data Scientist

  • theory vs practice
  • useful courses, articles, topics

Differences in a work of Data Scientist and Software Developer

  • what data scientist’s debug is
  • think more often about a code than writing it

Data science in product company vs. freelance

  • product is always about a quality and customers
  • data science is not only neural networks

Lifecycle of ML solution development

  • Data Mining. Importance of good data. Data sources, data markup.
  • Cleaning data. Data augmentation. Training/Validation/Test split.
  • Using the ML-algorithm.
  • Result metrics — training, validation and test errors
  • Underfitting and overfitting — what is it and how to deal with.
  • Final evaluation. Precision, recall, F1-score.
  • Network optimisation for release.
  • Release. Brief review of future algorithm improvement: centralised after-training, decentralised after-training, combined.

Review of simplest ML algorithms

  • K-means
  • PCA
  • LDA
  • Linear regression
  • Neural networks. What is it, where did it come from. Block notation. Some further improvements
  • convolutional nets
  • recurrent nets
  • LSTM

Practical part 1: creating simplest ML solution for house price estimating, using sklearn/numpy

Working examples of neural networks solutions in production

  • Readdle
  • Prisma
  • Google

Practical part 2: creating neural network to solve computer vision problem, using tensorflow

Sometimes things go wrong

  • lessons we’ve learned
  • practical recommendation