# Data Culture - The data team needs to be focus on delivering insights and supporting decisions. The outcome of the data team are *decisions* and a *shared context across the organization* that makes coordination easier. - Your goal as a data professional is to facilitate [[Making Decisions|decision making]] and [help surface/investigate the performance of a business](https://sqlpatterns.com/p/delivering-value-as-a-data-team) (e.g. [operational](https://twitter.com/ergestx/status/1731324299590479989)). - Learning to drive decisions quickly, a bias to action, is a critical competency for an analyst. Every skill you learn – [[communication]], [[writing]], [[experimentation]], [[Metrics|metric design]] – supports this. - [If analysis is not actionable, it does not really matter](https://twitter.com/decisionleader/status/1661041373783441408). Analysis must drive to action. [Clear results won't spur action themselves](https://www.linkedin.com/posts/eric-weber-060397b7_data-analytics-machinelearning-activity-6675746028144205824-CQxW/). The organization needs to be ready to pivot when something isn't working. - [Data doesn't make decisions, people do](https://twitter.com/teej_m/status/1765475939084029956). - [Data's impact is tough to measure — it doesn't always translate to value](https://dfrieds.com/articles/data-science-reality-vs-expectations.html) - The value of "insights" is often unknown. - The Data Team should be building and iterating the [Data Product](https://locallyoptimistic.com/post/run-your-data-team-like-a-product-team/). - Notebooks are a workshop. Production systems are the factory. Not everything needs to be put into production. Not everything should be a notebook. You need both. Lean in to the strength of each. - Data is fundamentally a collaborative design process rather than a tool, an analysis, or even a product. [Data works best when the entire feedback loop from idea to production is an iterative process](https://pedram.substack.com/p/data-can-learn-from-design). - [To get buy in, explain how the business could benefit from better data](https://youtu.be/Mlz1VwxZuDs) (e.g: more and better insights). Start small and show value. - Run *[Purpose Meetings](https://www.avo.app/blog/tracking-the-right-product-metrics)* or [Business Metrics Review](https://youtu.be/nlMn572Dabc). - Purpose Meetings are 30 min meetings in which stakeholders, engineers and data align on the goal of a release and what is the best way to evaluate the impact and understand its success. Align on the goal, commit on metrics and design the data. - Business Metrics Review is a 30 to 60 minutes meeting to chat and explore key metrics and teach how to think with data. - You don't hit a quantitative goal by focusing on the goal. You hit a quantitative goal by focusing on the process. - Business Reviews are one of the best ways to get people to think about data. - Value of clear goals and expectations. Validate what you think your job is with your manager and stakeholders, repeatedly. - [Weekly Business Review meetings are a process control tool](https://commoncog.com/the-amazon-weekly-business-review/). A tool designed to uncover and disseminate the causal structure of a business. - [While the output of your team is what you want to maximize, you'll need some indicators that will help guide you day-to-day](https://data-columns.hightouch.io/your-first-60-days-as-a-first-data-hire-weeks-3-4/). Decide what's important to you (test coverage, documentation missing, queries run, models created, ...), and generate some internal reports for yourself. - [Data teams should be a part of the business conversations from the beginning](https://cultivating-algos.stitchfix.com/). Get the data team involved early, have open discussions with them about the existing work, and how to prioritize new work against the existing backlog. Don't accept new work without addressing the existing bottlenecks, and don't accept new work without requirements. **Organizational [[politics]] matter way more than any data methods or technical knowledge**. The hard bit about becoming data driven in business isn't the technical bits. It's the political bits. - Including data people in meetings causes happy accidents! - The layout of the organization impacts time of the information to propagate and adds losses. - The modern data team needs to have *real organizational power* — it needs to be able to say "no" and mean it. If your data team does not truly have the power to say no to stakeholders, it will get sent on all kinds of wild goose chases, be unproductive, experience employee churn, etc. - Data should report to the CEO. Ideally at least with some weekly metrics split into (a) notable trends, (b) watching close, and (c) business as usual. - If data is the most precious asset in a company, does it make sense to have only one team responsible for it? - [People talk about data as the new oil but for most companies it's a lot closer uranium](https://news.ycombinator.com/item?id=27781286). Hard to find people who can to handle or process it correctly, nontrivial security/liabilities if PII is involved, expensive to store and a generally underwhelming return on effort relative to the anticipated utility. - The purpose of becoming data driven is to build a causal model of the business in your head. The purpose of doing all this work is that you want to understand how your business actually works and grows, not rely on superstitious beliefs about how your business works and grows. - You become data driven by looking at the data and not that much by hiring/expanding the data team. You can't outsource it. - [The pain in data teams come from needing to influence PMs/peers with having little control of them. Data teams need to become really great internal marketers/persuaders](https://anchor.fm/census/episodes/The-evolution-of-the-data-industry--data-jobs-w-Avo-CEO-and-Co-founder-Stefania-Olafsdottir-e16hu1l). That said, it shouldn't be the data team job to convince the organization to be data driven. That's not an effective way of spending resources. - Executives are expected to be data driven, even if they don't know what it means. - Epistemology of the leadership team really really matters. - People problems are orders of magnitude more difficult to solve than data problems. - **Integrate data where the decision is made**. E.g: Google showing restaurant scores when you're looking something for dinner. - Reduce the time to insights. If the data is already in the tool you're using, then there's zero time to insights. Provide a set of tools with the same data and let people choose depending on the goal. - Data rarely moves fast enough across companies to enable data-informed decisions. - [Earning the authority to deny requests is one of the most important factors to running a world-class data team](https://twitter.com/teej_m/status/1420432376270782464) - Data professionals can build consensus as the company becomes more diverse. - Data systems can establish methods for understanding the world even as it becomes more complex. Data is complex, and will stay that way. Not even great business people or strategy can make data simple. So we need data people to help guide business efforts. - Data literacy can create pathways for anyone to contribute equally to the organization's reality. - [Understanding variation is the beginning of data literacy](https://twitter.com/ejames_c/status/1732597443127382369). - Create a single space as the central place to post [[Data Practices#Data Request Template]]. - On the other hand, data analysis and data science are domain level problems and cannot be centralized. - Create a single space to [[Data Practices|share the results of analysis and decisions made based on them]]. - Log changes so everyone can jump in and be aware of what's going on. - Log assumptions and lessons learned somewhere. This information should loop back into the data product. - Make the warehouse the source of truth for all the teams. - What data is Finance/HR/Marketing using to set their OKRs? Put that on the warehouse and model it. - [[Metrics]] should be derived from the most realistic data sources. E.g: using internal databases instead of product tracking for "Users Created". - Do you want better data? Hire people interested in data! - Having managers tell the data team to "Find Insights" is a telltale mark of bad data management and organizational structure. - Good use of data is, ultimately, a question of good epistemology. ("Is this true? What can we conclude? How do we know that?") Good epistemology is hard. It must be taught. - **When things are going well, no one cares about data**. The right time to present data is when things are starting to go bad. Use your early warning detection systems to understand when it looks like it's gonna be time for data to step in and save the day and then position data as a solution in the context of whatever meaning makes sense. The stakeholders are decision makers and they don't have a ton of time. They're looking to make decisions, they're looking to solve problems. - [So much of data work is about accumulating little bits of knowledge and building a shared context in your org so that it's possible to have the big, earth shattering revelations we all wish we could drive on a predictable schedule](https://twitter.com/imightbemary/status/1536368160961572864). - A big purpose of data is knowledge. Knowledge is **"theories or models that allow you to predict the outcomes of your business actions"**. Insights may originate from data but are confirmed through actions. - You won't have the best allocation of resources in a reactive team. Data teams need extra [[slack]]. [Balance user requests with actual needs](https://scientistemily.substack.com/p/product-management-skills-for-data). - Do weekly recaps in Slack in to highlight key items, company-wide progress toward north-stars, improvements in certain areas, new customer highlights. All positive and fun stuff. - How can we measure the data team impact? - Making a [[Writing a Roadmap|roadmap]] can help you telling if you are hitting milestone deadlines or letting them slip. - Embedded data team members need to help other teams build their roadmap too. - Also, having a changelog ([do releases!](https://betterprogramming.pub/great-data-platforms-use-conventional-commits-51fc22a7417c)) will help show the team impact on the data product across time. - [Push for a *centralization of the reporting structure*, but keeping the *work management decentralized*](https://erikbern.com/2021/07/07/the-data-team-a-short-story.html). - Unify resources (datasets, entities, definitions, metrics). Have one source of truth for each one and make that clear to everyone. That source of truth needs heavy curation. Poor curation leads to confusion, distrust and…. lots of wasted effort. - Aim to share the source of truth with the production code. Usually database information is better than tracking information. - Data should be defined unambiguously in a single place. Anyone can look up definitions without confusion. - If definition or business logic is changed, backfills should occur automatically and data remains up-to-date. - [Organizations have *too much* data. Without better ways of organizing it, large volumes of data are more overwhelming than useful.](https://towardsdatascience.com/good-data-citizenship-doesnt-work-265f13a37fa5) - Use the questions people are asking to find data *hotspots* and focus our energy on those. That means some corners of your data will be messy, and some concepts will go undocumented. Data is perennially broken and messy. **Embrace the mess**. - Get excited when people ask questions. Embrace confusion and curiosity. Offer help. Be friendly. - [Reality is complex and multidimensional and often difficult to comprehend](https://mobile.twitter.com/rahulj51/status/1485429967131639808). - [Document data when it's generated](https://davidsj.substack.com/p/the-data-chasm). Make it part of the process of adding a new event, table, or a replication job, when the change is already top of mind. If possible, embed it in the development process, and pester people when they don't include the necessary updates. This shifts the burden of documentation upstream, making it part of the development cycle. - To align stakeholders incentives with the data team, stakeholders should show their impact through data. This forces stakeholders to [[Product Analytics|plan tracking]] and think about metrics. - [To achieve distribution, build for who your stakeholder truly is, not for the stakeholder you want them to be](https://ian-macomber.medium.com/launching-and-scaling-data-science-teams-three-years-later-f1fa6f25b4ae). - You should have something that answers the following questions: - Is [[Data Quality|Data Correct]]? - Is Data up to date? - Is Data Accessible and Discoverable? - Is Data Secure? - [Good things come from a knowledge of what a system is doing and when it is doing it](https://buz.dev/blog/the-contract-powered-data-platform). - Measure [[data quality]] to help set high standards for your data team. - Only after measurement can you optimize cost. - Only after timing can you make things faster. - [Forecasts need to have error bars](https://andrewpwheeler.com/2023/11/19/forecasts-need-to-have-error-bars/)! - [Aim for a culture of celebrating measurable progress and learnings, versus celebrating shipping](https://erikbern.com/2021/07/07/the-data-team-a-short-story.html). - Align company on key actions. Every stakeholder should know how to explore that data. - Do pre-mortems. Where would we see the impact of *X* going wrong? Model that and plot it on a dashboard. - You can force coordination by making a chart and start the discussion with it. Having a default chart will foce people to fight on the definition and also provides a starting point. Discussions are much better when there are based on data and definitions. - Coordination happens when people agree on data, direction, and how to move to the desired place. - [Send surveys](https://docs.google.com/forms/d/e/1FAIpQLSfufs_0zOGlFiE6oqrdZU7xCi399CBYbIlZkAMe15GTRRcPZA/viewform) from time to time trying to get pain points and know where issues are. - E.g: Do you have access to the data I need to make decisions in your role? - [Bring the collaboration process inline with the assets to allow for better handoffs and feedback](https://pedram.substack.com/p/data-can-learn-from-design). - [Culture eats strategy (and tools) for breakfast](https://news.ycombinator.com/item?id=29062266). Until there's a cultural mindset shift towards how companies value data and metadata, nothing will change. - [Tools eat process for breakfast](https://benn.substack.com/p/the-product-is-the-process). No matter how much you blog about best practices, or how many talks you give about better ways to work, people will eventually find their way back into the behavioral grooves cut by the products they use (e.g: dbt, GitHub, ...). - Most of the work done in data is in an effort to **reduce entropy** — Model data to remove inaccuracies, turn commonly asked questions into self-serve reports, and funnel ad-hoc questions into a formalized request process. This kind of attitude the nature of data practitioners. In the case of driving decisions with data, **embrace the chaos**. - Data doesn't so much drift towards entropy, **but sprints at it**. - [Navigating the chaos to arrive at a trustworthy recommendation is one of the most important jobs to be done.](https://roundup.getdbt.com/p/iterating-on-your-data-team). Decisions usually need to be taken faster and data analyst are [not invited to the table early enough](https://petrjanda.substack.com/p/bring-data-analyst-to-the-table). Again, be lean and iterate. - Data is *not* a "set it and forget it" kind of activity. Your dashboard *will* get stale in less than six months. Your key metrics *will* eventually have bad data in them. That machine learning model you spent all of last quarter developing *will* **[drift](https://towardsdatascience.com/model-drift-in-machine-learning-models-8f7e7413b563)** from its original fit. The environment in which your business operates is constantly changing, and so will the product or service that your business delivers. As a result, what is knowable about your business, about your product or service, is constantly changing too. And fast. - [Have regular cleanups and audits to keep data in check](https://www.avo.app/blog/data-literacy-why-people-dont-trust-data-tips-from-patreons-dir-of-data-science). They are crucial to keeping your data trust up to par. [Schedule time to delete stuff](https://twitter.com/EdDaWord/status/1532148425487097857). - We're moving from software consumers to data consumers. Data and BI will become more and more federated (you get data insights on your JIRA card without having to leave JIRA) - Over time, data literacy across organizations will become commonplace the same way typewriting has. [Most professionals, at all levels of the business, will be capable of generating their own insights without requiring a data team](https://roundup.getdbt.com/p/data-expertise-everywhere). - Data practitioners acknowledge that solid reporting is at the bottom of the data hierarchy of needs but few companies do even basic KPI reporting well. - [Doing the fundamentals really well almost always exposes how little is actually understood about why things are happening. It's uncomfortable for high performing people to acknowledge that your grip on the levers is slippery](https://twitter.com/gwenwindflower/status/1498822586255519744). - [Data ownership is a hard problem](https://www.linkedin.com/posts/chad-sanderson_heres-why-data-ownership-is-an-incredibly-activity-6904107936533114880-gw8n/). Data is fundamentally generated by services (or front-end instrumentation) which is managed by engineers. CDC and other pipelines are built by data engineers. The delineation of ownership responsibilities is very rarely established, with each group wanting to push 'ownership' onto someone else so they can do the jobs they were hired for. - [Becoming a data-driven organization is a journey, which unfolds over time and requires critical thinking, human judgement, and experimentation](https://hbr.org/2022/02/why-becoming-a-data-driven-organization-is-so-hard). Fail fast, learn faster. - [Data-drivenness is about building tools, abilities, and, most crucially, a culture that acts on data](https://twitter.com/ejames_c/status/1732592768890057115). - [Path to create a data-driven organization](https://twitter.com/_abhisivasailam/status/1520274838450888704): - 1. Get a well-placed leader with influence to message, model, and demand data-driven execution. - 2. Hire/fire based on data aptitude and usage. - 3. Create mechanisms that force analytical conversations. Sometimes there is no way around spending an afternoon breaking down metrics by different segments until you find The Thing. - [Start small. Don't try to wrangle data for the entire company until you have the tools and process down for one team](https://data-columns.hightouch.io/your-first-60-days-as-a-first-data-hire-weeks-3-4/). - Difficulty to work with data scales exponentially with size. - [Rule of thumb; your first customer as a data person should be growth](https://twitter.com/josh_wills/status/1577699871335010304). - [Data is used largely to answer these questions](https://roundup.getdbt.com/p/bring-back-scenario-analysis): - What already happened? Descriptive statistics. - What will happen? Predictive statistics. - What would happen if…? Scenario analysis. - Most importantly, What are we trying to improve? What types of benefits are we expecting to see? - [Stakeholders will always need more data and is hard to say no. Communicating technical complexity to non-technical colleagues can be tough. Some reasons the ask might be hard](https://twitter.com/RichardSwinbank/status/1671780316573310977). - You know if the data exists. - You know where it is. - You know how to translate that question to a data question. - You know how to answer that data question by converting the data that exists into an answer. - You're aware of the the quirks in the data. - [Differentiate analytics from data platform work. They are two different jobs, and expecting one to do the work of the other is a trap](https://twitter.com/jamesdensmore/status/1518998298111225857). - Data Platform: data infra, pipelines, and a bit of data warehouse modeling - Analytics: Making sense of data to guide decisions - Make your [modeling technique](https://data-columns.hightouch.io/untitled-2/) explicit. - Have a documentation [entry-point for Data](https://github.com/mozilla/data-docs). - [For self-serve, aim to own as little as possible but keep in mind you can't make people do what you want but can stop them for doing what you don't want](https://youtu.be/wyW6hQGZxgY) - [You need to make a grocery store. You can't give folks directions to the farm to pick their own produce](https://twitter.com/teej_m/status/1603205457992044545). - It's easy to lie with statistics, but it's hard to tell the truth without them. - On the other hand, good science doesn't need statistics, you can just look at the scatterplot. - Most people approach data with an "optimisation worldview", thinking in terms of "make number go up". There is an alternative. The Process Control worldview, which is similar to"Here is a process. Your job is to discover all the control factors that affect this process.". - Your job is to figure out what you can control that affects the process, and then systematically pursue that. - You can discover these control factors through one of two ways: - [[Experimentation]] - Observe sudden, unexplained special variation in your data, which you must then investigate to uncover new control factors that you don't already know about. - Don't over rely on data. [Data is inherently objectifying](https://schmud.de/posts/2024-08-18-data-is-a-bad-idea.html) and naturally reduces complex conceptions and process into coarse representations. There's a certain fetish for data that can be quantified ([McNamara fallacy](https://en.wikipedia.org/wiki/McNamara_fallacy)) - [It's hard to capture reality with data](https://javisantana.com/fastdata/40-things-I-learned-about-data.html). Modelling reality always gets complex. There are always small nuances, special conditions, things that changed, edge cases and, of course, errors (which sometimes became features). Data visualizations are lossy. ## Tools Sometimes, [tools help creating cultural change](https://commoncog.com/becoming-data-driven-first-principles/#the-trick). These might help your organization think better with data. - [Process Behavior Charts](https://demingalliance.org/resources/articles/process-behaviour-charts-an-introduction) and [Process Control](https://two-wrongs.com/statistical-process-control-a-practitioners-guide.html). - [Time Lagged Conversions](https://better.engineering/modeling-conversion-rates-and-saving-millions-of-dollars-using-kaplan-meier-and-gamma-distributions/). - Change point detection.