#63: The Interplay Between Data Science and Agile with Lance Dacy
Agile Mentors Podcast - Un pódcast de Brian Milner and Guests - Miercoles
Categorías:
Join Brian and his guest Lance Dacy as they address the interplay (and the skepticism) of combining Agile and data science. Tune in as they explore the art of crafting Minimum Viable Products (MVPs) to create impactful and efficient solutions. Overview In this episode of the Agile Mentors Podcast, Brian sits down with Lance Dacy to delve into the nuances of aligning data science with the software development mold while dispelling the myths along the way. Listen in as Lance shares his wealth of experience and insights guiding listeners through the step-by-step process of building MVPs in data science projects and sharing how Agile principles seamlessly apply to both worlds. Listen Now to Discover: [01:13] - Brian introduces Lance Dacy on the Agile Mentors Podcast. Since listeners appreciated the previous data science and agile episode, Lance is here for Part Two, this time discussing how data science fits into the software development mold. [02:00] - Addressing the skepticism of combining agile and data science; Lance has both expertise and practical experience. [02:43] - Lance emphasizes that he understands the “naysayers” concerns but aims to help others comprehend the synergy. [03:05] - Waterfall might be better: sorting out the different perspectives on Agile development and data science. [04:45] - The importance of scoping and architecture in software development projects. [05:15] - Challenging the notion of perfectly defined objectives. [05:46] - Most software projects lack a completely predefined understanding. [06:39] - How Agile's empirical process and mindset of experimentation align with data science. [07:30] - Presenting a real-world MVP example combining business drivers and data science techniques. [08:04] - Clarifying what Agile is—a philosophy based on values, not a step-by-step process. [09:03] - The importance of sustainable pace and productivity in Agile. [10:10] - Introducing the concept of MVP and acknowledging the evolution of data science techniques. [11:00] - Discussing MVP in the context of data science, and aligning it with empirical approach. [11:38] - Highlighting the role of MVP in testing assumptions, mitigating risks, and user feedback. [12:00] - Exploring data science's practical relevance for consumers to forge a relatable discussion. [12:47] -Acknowledging familiarity with technology, uncertain about tactics. [13:00] - Highlighting how AI and data science are pervasive in everyday technology use. [13:29] - Examples of AI data science integration: search engines, online shopping recommendations, social media content, smart homes, and more. [14:42] - Introducing common uses of data science: customer segmentation and marketing techniques. [15:19] - Applying clustering techniques like K means for automated segmentation. [15:34] - Lance shares his paper on supply chain optimization, using an ant colony algorithm. [15:56] - The techniques and purpose of supply chain optimization. [16:23] - Exploring data science applications: collaborative filtering, matrix factorization, neural networks. [16:42] - Clarifying data scientists' approach: not a random process but based on problem-solving with models. [17:18] - Iterative development as a primary reason for MVP in data science. [17:57] - Using real-world performance data for model improvement. [18:21] - Risk mitigation as a critical aspect of MVP: linking risk mitigation to surviving challenges and learning from them. [19:51] - Starting with an MVP reduces risk by avoiding overly complex models without sufficient feedback. [20:19] - Setting stakeholder expectations with an MVP: providing tangible insight into data science trade-offs and early deliverables. [20:39] - Highlighting operational considerations of deploying and maintaining data models, addressing challenges in data pipelines, infrastructure, and monitoring. [22:17] - An MVP approach aligns with Agile principles for data science. [22:35] - Brian clarifies the misconception that MVP means sacrificing quality for speed. [23:30] - Lance agrees, addressing the misconception, and emphasizes MVP's importance in learning and improvement. [23:32] - Have you thought about training with Mountain Goat Software? With classes such as Mountain Goat Software, Certified Scrum Product Owner (CSPO) developed by Mike Cohn, and team home software for better interactivity during classes you can’t go wrong. [24:00] - Brian suggests transitioning to walking through a model or example of creating an MVP. [24:07] -A tangible framework for mapping data science work to MVP steps, acknowledging the contextual nature of the process. [24:50] - Lance acknowledges the complexity of the steps, so they’ve been posted below under resources. [25:11] - The importance of problem definition and defining the scope of the MVP. [26:34] - The challenge of gathering and preprocessing data. [27:20] - Selecting a simple model that is easy to interpret and implement for faster training times, easier troubleshooting, and adherence to the principle of parsimony. [29:12] - Using feature engineering to select the most relevant features for the model. [29:33] - Choosing a manageable number of features for the model, rather than attempting to incorporate all available data and avoid overcomplicating or overfitting the model. [30:11] - Lance emphasizes the importance of selecting a simple model and conducting feature engineering based on the insights gained from that model. [30:36] - Training the chosen machine learning model using pre-processed data, typically by splitting the data into training and validation sets. [31:15] - The challenge of evaluating the model's performance and the importance of using the appropriate metrics. [31:34] - The goal: create a model that is good enough for gathering feedback that aligns with the concept of MVP. [31:53] - Lance describes the last step of building an MVP: deploying the MVP by integrating the model into a suitable platform or application. [32:26] - The importance of making the MVP accessible to end users. [33:00] - The crucial feedback loop for making improvements to the model and features, and refining, scaling, or reconsidering the approach. [34:09] - Why you might want to initially deploy a slightly higher-level model. [34:57] - The parallel between the steps of creating an MVP in data science and the principles of Agile. [35:18] - Brian adds that in data science, feedback not only comes from customers and users but also involves analyzing results and outcomes as a form of feedback to refine the model. [35:53] - The importance of relying on scientific expertise to analyze the results of the model and evaluate its accuracy and validity. [36:10] - In data science, the feedback loop also involves analyzing the outcomes and results, similar to the iterative process of receiving user feedback in software development. [37:00] - Lance draws parallels between software development and data science by comparing the process of building software features with the steps involved in creating an MVP for data science. [39:21] - Lance offers some practical examples, beginning with a recommendation system. [41:06] - The decision tree approach and its benefits for stakeholders. [43:00] - Lance talks about churn prediction to gradually incorporate more nuanced data. [43:55] - MVPs for chatbots and the benefits of starting with simple scripted responses in a chatbot MVP. [45:59] - Managing multiple projects. [46:24] - The effectiveness of using logistic regression and decision trees for MVPs. [47:00] - Lance emphasizes the importance of managing stakeholders' expectations. [47:53] - Lance discusses the need to consider the context when interpreting model performance metrics and involving stakeholders in these discussions. [49:16] - The importance of collaboration between data scientists and stakeholders for delivering valuable solutions. [50:11] - Lance draws a comparison between data science and software development in terms of the challenge of coordinating work across different specialized areas. [51:00] - Lance highlights the importance of feedback and iterative adjustments for success. [53:24] - Again, you can find Episode #54: Unlocking Agile's Power in the World of Data Science with Lance Dacy, here. [53:48] - We’d love to hear your thoughts on this topic and your suggestions for future topics. Just email [email protected]. If you enjoyed the episode, the best way to support us is to share it with others and subscribe to the Agile Mentors Podcast on Apple Podcasts. [55:00] - Don’t forget to check out the Mountain Goat Software Certified Scrum and Agile Training Schedule, including, Certified Scrum Master (CSM) or a Certified Scrum Product Owner (CSPO) and Advanced Certified Scrum Master (ACSM) and Advanced Certified Scrum Product Owner (ACSPO) classes. I'd really love to see you in class! References and resources mentioned in the show: 6 Reasons Why I Think Agile Data Science Does Not Work | by Ilro Lee Why Data Science Doesn't Respond Well to Agile Methodologies Lance’s SMU Paper (Ant Colony Algorithm and Traveling Salesman Problem) #54: Unlocking Agile's Power in the World of Data Science with Lance Dacy Certified Scrum Master Training and Scrum Certification Certified Scrum Product Owner Training Advanced Certified ScrumMaster® Advanced Certified Scrum Product Owner® Mountain Goat Software Certified Scrum and Agile Training Schedule Join the Agile Mentors Community Subscribe to the Agile Mentors Podcast on Apple Podcasts Reasons for Quick MVP in Data Science are to support: Iterative Development Feedback Loop Risk Mitigation Setting Expectations Operational Considerations Steps of the MVP: Problem Definition Gather and Preprocess the Data Select a simple model Feature engineering Train the model Evaluate the model Deploy the MVP Collect Feedback Iterate Decision Time Examples of MVP in Data Science (Logistic regression and decision trees are often used as initial models due to their simplicity, interpretability, and relatively quick development time.) Recommendation Systems: Instead of building a complex recommendation engine, a company might start with a simple rule-based system (e.g., recommending the most popular items) to gauge user interest and system engagement. Churn Prediction: An MVP might involve creating a basic model based on a few key features (like usage frequency and customer complaints) to predict which customers might churn. Later versions can incorporate more nuanced data and sophisticated algorithms. Natural Language Processing (NLP): For a chatbot, the MVP might involve scripted responses or basic keyword matching. Once deployed, user interactions can inform the development of more advanced NLP capabilities Conclusion With Rapid MVP, context is crucial with regard to our general benchmarks (F1-Score, ROC-AUC, MAE, RMSE). You should strive to always consider the context of those benchmarks with the problem being solved. In some medical diagnostic tests, even an F1-score of 0.95 might not be good enough due to the severe consequences of false negatives or false positives. We also likely need to compare the model's performance metrics with a simple baseline (e.g., random classifier, mean prediction) to determine how much value the model is adding. Always align the model's performance with business objectives. Even a model with a high ROC-AUC might not be suitable if it doesn't meet the specific precision or recall targets set by the business. Isn’t it better to find ways to know that earlier than later? Want to get involved? This show is designed for you, and we’d love your input. Enjoyed what you heard today? Please leave a rating and a review. It really helps, and we read every single one. Got an Agile subject you’d like us to discuss or a question that needs an answer? Share your thoughts with us at [email protected] This episode’s presenters are: Brian Milner is SVP of coaching and training at Mountain Goat Software. He's passionate about making a difference in people's day-to-day work, influenced by his own experience of transitioning to Scrum and seeing improvements in work/life balance, honesty, respect, and the quality of work. Lance Dacy is a Certified Scrum Trainer®, Certified Scrum Professional®, Certified ScrumMaster®, and Certified Scrum Product Owner®. Lance brings a great personality and servant's heart to his workshops. He loves seeing people walk away with tangible and practical things they can do with their teams straight away.