I spend a good amount of time reading books, listening to audio books and podcasts, watching webinars and taking courses in artificial intelligence and machine learning that has resulted in some pretty exhaustive note-taking that sometimes proves to be pretty useful. Useful in the sense that I repurpose them for projects and maybe you can too. This list will be updated as I work through more projects and learn new questions to ask.

How to use this checklist? Depending on where you are in the machine learning lifecycle, scroll to the section, then think and/or talk through the bullets. Personally, I have found success using this list as a thought-exercise to define the wholistic solution versus a Q&A document template, although you could do that too.

Jump to lifecycle phase checklist:
- Frame the Problem
- Get the Data
- Explore the Data
- Prepare the Data
- Present the Findings
- Explore Different Models
- Fine-tune the Models
- Present the Solution
- Deploy/Serve, Monitor and Maintain
| #1 – Frame the Problem (and understand the space) |
|---|
| Summarize the complete business objective. |
| How would you solve the problem manually? |
| Is human expertise available? |
| Are you replacing an existing solution? |
| Do you have a baseline or are you creating a baseline? |
| Can you reuse experience or tools? |
| What are the comparable problems? |
| How will the solution be used? – Real-time (immediate) predictions – Batch (overnight) predictions |
| What are the current solutions/workarounds? |
| What are the compute resources (CPU/GPU/Memory) running the solution? – Cloud (greater resources, connectivity dependent) – Edge (fewer resources, sustain disconnection) |
| What are security and privacy requirements that could influence the solution? |
| How should performance be measured? – Query Per Second (QPS) – Latency (result in 500ms, 300ms, etc.) – Throughput (1000 QPS) |
| What is the minimum performance needed to attain the objective for a prototype? |
| Will data be logged for analysis or retraining? |
| What challenges are expected? |
| Data Quality issues are common (corrupt, late, incomplete data) |
| Model Decay from drift (past performance doesn’t guarantee future results) – Data/Feature/Population/Covariate Drift (input changed and the trained model is not relevant) – Training-Serving Skew (training on artificial data mismatching the real world) – Concept Drift (Patterns the model learned no longer apply) – Gradual Concept Drift (world changes and model doesn’t) – Sudden Concept Drift (pandemic, overnight sudden shift) – Recurring Drift (seasonality, Black Friday and Gray Monday Sales) |
| List the assumptions made so far: |
| #2 – Get the Data (expect >1 source) |
|---|
| List the data needed and how much? |
| Where to get the data? |
| How much space will it take? |
| Check legal obligations? – Share — copy and redistribute the material in any medium or format – Adapt — remix, transform, and build upon the material for any purpose, even commercially. |
| Get access authorizations. |
| Create workspace with enough storage. |
| Get the data. |
| Convert the data to a format to manipulate without changing the data. |
| Ensure sensitive data is deleted or protected (e.g. anonymized). |
| Check the size and type of data (time series, sample, geographical). |
| #3 – Explore the Data (to gain insights) |
|---|
| Create a copy of the data for exploration. |
| Create a Jupyter Notebook to keep record of exploration. |
| Is the data skewed? |
| Study each attribute/datapoint. – name – data type (numeric, categorical, bounded/un, text, structured, etc.)unstructured data may not require feature engineering – structured data tends to require more feature engineering – % missing values – Noisiness (stochastic, outliers, rounding errors, etc.)? – Usefulness for the task? – Type of distribution (Gaussian, uniform, logarithmic, etc.)? |
| For supervised learning tasks, identify the target attributes. |
| Visualize the data. |
| Study the correlations between attributes. |
| From the data, study how to manually, then automatically solve the problem. |
| Create/Collect better data. – Use augmentation to get more data – Improve label accuracy and data quality |
| Identify promising transformations. |
| Identify extra data that would be useful (…then, go back to Get the Data.) – Data augmentationDoes the data look/sound realistic? – Is the x > y mapping clear? – Can human perform on the data? – How does the algorithm perform on the new data? |
| Plan for data iteration loops (adding/improving data > training > error analysis). – If the data is unstructured, the model is large and the x > y mapping is clear, than adding data shouldn’t hurt. – If the data is structured, the model is small or mapping is unclear, than adding data might hurt model performance. |
| Document it. |
| #4 – Prepare the Data (to expose underlying data patterns) |
|---|
| Work on copies of the data. – Feed into experiment tracking. |
| Write functions for all data transformations. |
| Data cleaning. – Fix or remove outliers – Fill in missing values – Drop rows or columns |
| Feature selection. – Drop attributes that provide no useful information for the task |
| Feature engineering. – Discretize continuous features – Decompose features – Add transformations – Aggregate features |
| #5 – Present the Findings (to re-baseline everyone involved and solicit feedback) |
|---|
| Document what you have done. |
| Create a presentation. – You can convert notebooks to html slideshows with nbconvert |
| Explain why the data achieves the business objective. |
| Highlight interesting points. |
| What worked? |
| What didn’t? |
| Assumptions. |
| Limitations. |
| #6 – Explore different models (shortlist the top performers that fail differently) |
|---|
| Sample smaller training sets (if affordable). |
| Train many models from different categories. – see scikit-learn Choosing the right estimator |
| Measure and compare model performance. – Error Analysis – TensorFlow Model Analysis (TFMA) detailed metrics on new models on different slices of data |
| Analyze the most significant variables for each algorithm. |
| Analyze the types of errors the models make. |
| Things to track in your Experiment Tracker: – Algorithm used – Version of code – Dataset used – Hyperparameters used – Save the output – Save the score |
| Would a human have avoided the errors? |
| Perform a round of feature engineering and selection. – Perform 1-2 quick iterations. |
| Shortlist the top 3-5 most promising models that make different errors. |
| Document it. |
| #7 – Fine-tune the Models (to select or combine for the best solution) |
|---|
| Want as much data as possible. |
| Fine-tune hyperparameters using cross-validation. |
| Try Ensemble methods combining models. |
| Measure final model on Test set to estimate the error. |
| #8 – Present the Solution (to re-baseline everyone involved and solicit feedback) |
|---|
| Create a presentation. – You can convert notebooks to html slideshows with nbconvert |
| Explain why the solution achieves the business objective. |
| Highlight interesting points. |
| What worked? |
| What didn’t? |
| Assumptions. |
| Limitations. |
| #9 – Deploy, serve, monitor and maintain the system |
|---|
| Get model code ready for production. |
| Write monitoring code to check at regular intervals and trigger alerts when it drops. Examples in following rows: |
| Fraction of non-null output. |
| Software metrics. – Memory – Compute – Latency – Throughput – Serverload, etc. |
| Input (X) Distributions. – Average size of input (image, video, volume, count, etc.) – Missing values – Brightness, etc. |
| Output (Y) Distributions. – Count (null) ” ” returned – Repeat searches – Time between requests |
| What could go wrong and what statistics define them? |
| Auditing Framework. – Accuracy – Fairness/bias – Performance on subsets / new data – Error commonality – Error on rare cases |
| Model Quality. – Accuracy, Precision, Recall, F1_score – Mean Error Rate – Downstream business KPI (clickthrough rate) |
| Beware: – Slow degradation as data evolves – Monitor input quality as well as output |
| Retrain models on fresh data regularly (automate it!) – Manual and/or Automatic retraining |
References
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow, 2nd Edition
- Machine Learning Engineering for Production (MLOps) Specialization
- Experience and failures along the way