2022 NormConf - My favourite talks

listicle

learning

spark

art

simulation

Lessons from real-life data science practitioners

Author

Daniel Tan

Published

December 20, 2022

2022 NormConf

NormConf is a free, virtual tech conference created by Vicki Boykis with a keen focus on all the unglamourous challenges that real-life data science practitioners encounter in their day-to-day work. In an arena where many other tech conferences focus on the latest and greatest, this conference is a breath of fresh air! I can sincerely share that I found more ideas to bring back to try at my work from this conference than any previous conference I have attended.

The conference was held in a single day, split between 3 sessions and several lightning talks. Each session had about 10 speakers covering a variety of topics. All of these are available for free on NormConf’s youtube channel.

I spent the weekend catching up on all the talks and wanted to share a couple of my favourites:

Session 1

Spark Horror Stories From The Field by Guenia Izquierdo

This talk right here is totally my jam. Apache Spark is an amazing tool. In fact, to conduct and model experiments for a large-scale business operation (like I do for work), the usage of Spark is pretty much table stakes. That said, like many powerful tools, Spark is also a frustrating footgun.

Guenia shares her experience on common pitfalls you might encounter when building Spark applications for the 1st time. I found myself nodding silently as she went through every mistake that I personally faced when I first started learning Spark. Additionally, she gives useful tips explaining the implications for each Spark command you invoke which promotes a better understanding of the entire Spark framework.

If you are new to Spark, I have to recommend the first couple of minutes of Guenia’s talk as she provides one of the clearest and most succint crash course to Spark I have seen presented.

A Game of Construction by Helena Sarin

Art is all a game of construction. Some with the brush, some with the shovel, some choose the pen and some of us … choose neural networks

That quote up thre is how Helena started her talk 🤯 and boy was I absolutely blown away. Describing herself as an “engineering-artist”, Helena shared details and examples of her artwork and art process. I was deeply inspired by the many similarities Helena drew between the process of refining art work with the iterative nature of building complex models.

I have always enjoyed the rare occasion where I get to play with generative art, and listening to a true artist’s artist share about her process was an incredible treat.

Session 2

ML doesn’t always replace rules, sometimes they work together by Jeremy Jordan

Jeremy’s talk touches upon a very real set of skills required to run machine learning solutions in a highly regulated environment such as a bank or financial institution. I really enjoyed the nuance Jeremy’s talk provides to how we can combine rules based heuristics and machine learning into a robust policy layer.

As with any complex solution, the talk is less of a how-to but more of a framework of discussions you should be looking to have if your team is building a solution that matches this use case.

Ethan Rosenthal and the M1 Misadventure by Ethan Rosenthal

Ethan’s M1 Misadventure covers all the trials and tribulations he went through to get his team’s codebase built on the new M1 Macs, and what can I say… A story about chasing obscure errors, finicky dependencies and the endless loop of upgrading/downgrading/backtracking changes just to get something work? That’s both my guilty pleasure 🍿🍿, as well as, an average work day 🥲.

Ethan makes sure to share some useful tips about Python environments and dependency management too if you’re looking for tips in that category. I enjoyed this talk mostly because it validates that getting that darn code to compile/run is a common problem for any data scientist.

They don’t call it dependency hell for nothing 😈😈😈

Session 3

How many folds is too many? Efficient simulation for everyday ML decisions by Julia Silge

Whereas most of the other talks I’ve shared thus far are broadly inpsirational and entertaining, Julia’s talk is something I could bring directly back to my everyday work. Julia’s talk approaches a question that many new data science learners have but is hard to answer - When performing cross validation, how many folds is too many? And while Julia devotes a good amount of time approaching that question, what her talk really illustrates is how do you break down a question and empirically approach a solution via simulation.

…unless you cop out with an answer like “it depends” 🙄

Simulation is such a powerful tool that goes beyond just the introduction of resampling and bootstraps. Simulation is a paradigm that empowers you to structure your belief of how a system works explicitly as a model, and validate those assumptions with real data. I have found so many areas of my work where approaching the thorny problem through the lens of simulation has clarified my understanding of viable solutions.

If you have only time for 1 talk, this would be my recommendation.

Don’t do invisible work by Chris Albon

This is slightly nostalgic for me as Chris’s data science notes was 1 of the first data science focused resource I chanced upon when first entering the field, and it felt really great to hear Chris’s talk. Chris’s talk focuses on the importance of not doing invisible work and sharing various tips on how you can systematically log the important milestones you achieved and impact for the business you have created.

While there is a clear connection between this talk and your career as a data scientist, I like to think that this practice of putting your work out there also has benefits from a learning angle. Reflecting on the short time I have committed to sharing articles on this blog, I can clearly see the improvement in my clarity of writing and ability to articulate ideas. Beyond the benefits to your career, ensuring you are not doing invisible work is also a very important step to take charge of your own learning.

Lightning Talks

Hell is other people’s bugs by TJ Murphy

TJ’s lightning talk covers all the common bugs he has seen in experimental platforms. The blind spot here is that while the statistical theory of which ever method you choose might be infallible, the implementation of said method in live experiments is not. As this is precisely the niche area of data science I work in, this talk was pretty much an instant-share for me.

How to name files by Jenny Bryan

If there was ever 1 topic that anyone, and yes, I mean anyone who works in tech has an opinion on, it’s about the right way of naming files. Jenny’s talk covers her recommendations (I agree with them … mostly). This talk is the one that best fits the “normcore” spirit of the entire conference, you have to watch it.

Conclusion

🎉🎉 Congrats and well done to everyone who made Normconf 2022 a success! It was truly my favourite conference this year.

Appendix

Expand for Session Info

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.1 (2023-06-16)
 os       macOS Ventura 13.5
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Asia/Singapore
 date     2023-08-10
 pandoc   3.1.6 @ /usr/local/bin/ (via rmarkdown)
 quarto   1.3.433

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version date (UTC) lib source
 sessioninfo * 1.2.2   2021-12-06 [1] CRAN (R 4.3.0)

 [1] /Users/ddanieltan/Code/ddanieltan.com/renv/library/R-4.3/x86_64-apple-darwin20
 [2] /Users/ddanieltan/Library/Caches/org.R-project.R/R/renv/sandbox/R-4.3/x86_64-apple-darwin20/84ba8b13

──────────────────────────────────────────────────────────────────────────────

There are 2 ways of spreading light. Be the candle or the mirror that reflects it – Edith Wharton

Reuse

https://creativecommons.org/licenses/by/4.0/

Citation

BibTeX citation:

@online{tan2022,
  author = {Tan, Daniel},
  title = {2022 {NormConf} - {My} Favourite Talks},
  date = {2022-12-20},
  url = {https://www.ddanieltan.com/posts/2022-normconf},
  langid = {en}
}

For attribution, please cite this work as:

Tan, Daniel. 2022. “2022 NormConf - My Favourite Talks.” December 20, 2022. https://www.ddanieltan.com/posts/2022-normconf.