## Workshop: Data Science at Readdle

Workshop: Data Science at Readdle – is a Data Science UA and Readdle project, in which we, together with Ivan Budnikov and Boris Tarovik, will consider aspects of the Data Scientist’s work, the life cycle of the ML project, the basic ML algorithms, examples of neural networks in production of such companies as Google, Readdle, Prisma and more!

Participants will be able to create simplest ML solution for house price estimating, using sklearn/numpy and a neural network to solve computer vision problem, using tensorflow

Participation in the workshop is free for pre-registration (you must receive a letter confirming the registration to the event)

#### Software requirements:

- Python3, numpy, sklearn, tensorflow, skimage

**For whom the course was developed:**

- people interesting in Data Science and Machine Learning
- students of computer science/math departments
- developers/engineers/QA

#### Speakers

**Boris Tarovik**

RnD Engineer, Readdle

**Ivan Budnikov**

Machine Learning Engineer, Readdle

#### Course program

#### Introduction

- Data Science, Big Data, Machine Learning — what does it mean?
- Machine Learning vs usual algorithms — what’s the difference?
- Types of ML — supervised, reinforced, unsupervised

#### Knowledge you need to have to become Data Scientist

- theory vs practice
- useful courses, articles, topics

#### Differences in a work of Data Scientist and Software Developer

- what data scientist’s debug is
- think more often about a code than writing it

#### Data science in product company vs. freelance

- product is always about a quality and customers
- data science is not only neural networks

#### Lifecycle of ML solution development

- Data Mining. Importance of good data. Data sources, data markup.
- Cleaning data. Data augmentation. Training/Validation/Test split.
- Using the ML-algorithm.
- Result metrics — training, validation and test errors
- Underfitting and overfitting — what is it and how to deal with.
- Final evaluation. Precision, recall, F1-score.
- Network optimisation for release.
- Release. Brief review of future algorithm improvement: centralised after-training, decentralised after-training, combined.

#### Review of simplest ML algorithms

- K-means
- PCA
- LDA
- Linear regression
- Neural networks. What is it, where did it come from. Block notation. Some further improvements
- convolutional nets
- recurrent nets
- LSTM

#### Practical part 1: creating simplest ML solution for house price estimating, using sklearn/numpy

Working examples of neural networks solutions in production

- Readdle
- Prisma

#### Practical part 2: creating neural network to solve computer vision problem, using tensorflow

Sometimes things go wrong

- lessons we’ve learned
- practical recommendation