6 Essential Steps to Building a Great Data Culture
So questions like “what was our revenue last year” are not being thrown around as “data requests” anymore
So questions like “what was our revenue last year” are not being thrown around as “data requests” anymore
In my years as a Data Science consultant, I have worked with clients whose data capabilities range from one end of the spectrum to the other. During this time, I have arrived at one key observation: Companies that have succeeded in taking advantage of the big data era build and maintain a good data culture throughout the whole organization instead of just in their “data groups”.
Companies that only put money and effort into hiring good data talents but pay no attention to building a data culture in the rest of the organization usually struggle with keeping those good data talents. They eventually burn out and churn because they spend the majority of their time functioning as “data Siri”: “Hey XXX, what was our revenue last year?”, “Hey XXX, can you pull data for our past three months’ sales?”
Building a data culture in the entire organization will equip employees in operations, product, marketing, HR, and many other departments with the ability to fulfill most of the simple data queries and data visualization in a self-serve manner. In addition to making these teams much more self-sufficient and nimble, it allows Data Analysts and Data Scientists to reallocate their time to their core responsibilities of building and improving data systems and models as well as providing complex, meaningful analyses that go beyond simple data pulls.
If you are a member of your company’s data org or a data advocate in general, read on. This article will give you some valuable tips about how to build a great data culture.
1. Start with no-code options (like Looker or Tableau)
Since most people with a demanding job won’t have time to pick up a new hard skill like SQL in a short timespan, no-code options are a less intimidating starting point to get people in your organization to be comfortable working with data.
There are more and more options out there that can help with this step, Looker and Tableau being the top two choices most companies go after. It’s important to know the differences between the two and their limitations in order to choose the right one for your company.
Looker has some great built-in analytics functions to foster easier adoption for people without in-depth analytics knowledge. One limitation that comes to mind is that Looker can primarily be hooked up to different data-warehouses such as PostgreSQL, Google BigQuery, etc. but doesn’t currently have good options to ingest CSV data. The only workaround is to upload CSV files into those databases and make a connection from it to Looker. But what if ALL of your data is currently in Excel and there’s no data warehouse?
Tableau might be a better bet if all your data currently resides in spreadsheets. Tableau can handle both connections to data warehouses and the ingestion of local excel sheets. However, it falls short in terms of customizable analytics and ease of use for people with less analytics knowledge and backgrounds.
Even though the current no-code options have their shortcomings, they nevertheless lower the barrier of entry to the data world and serve as great starting places to build a great data culture within your organization.
2. Provide on-the-job learning opportunities for teams that should learn more advanced options (SQL)
As great as the no-code options are, they inevitably hit their limitations and require more advanced knowledge in SQL when they are used to carry out complicated analyses. In addition, licenses can be costly, so many companies choose to restrict access to select teams.
For groups like operations and marketing, which routinely carry out complicated, data-heavy analyses or conduct AB tests and other experiments, employees will eventually realize that they need enough knowledge in SQL to build “derived tables” in Looker or at least know enough pseudocode to efficiently communicate their data needs to data groups. Learning SQL can also enable your employees to ditch tools like Looker altogether and query databases directly, removing the bottleneck of tool limitations or reliance on data teams.
You can enable employees in those groups by providing on-the-job learning opportunities, for example through third-party providers such as Vertabelo, free alternatives such as W3Schools, and/or an internal SQL training developed by your data org.
3. Make sure everyone has access to the databases
This one sounds like a no-brainer, right? You will be surprised to learn how many companies have their databases only accessible to a handful of people in the data groups. So no doubt the entirety of the data pull requests from the whole company fall on the shoulders of a dozen employees.
Whatever your reason is for limiting access, there usually is an easy workaround; if the limitation is implemented to avoid accidental deletion or messing around with the data, the permission of writing and editing can be limited to specific data groups while the permission of querying and reading the databases should be granted to everyone. If the limitation of accessibility is due to concerns for sensitive data (e.g. PII — Personally Identifiable Information), it’s usually easy to implement a workaround such as building a secondary table without PII that can be accessible to everyone.
Introducing analytics and building a data culture already pushes a lot of people out of their comfort zones; don’t unnecessarily discourage them by making them fight for permissions.
4. Have all your data in one place, look into data warehouse solutions
Having your data in different places is a pain for everyone who uses data; for example, no-code options usually can’t join tables from different connections (different databases). That’s where data warehouses like Snowflake and BigQuery come in. They serve as a one-stop-shop for data and avoid the situation of multiple sources of truth living in different places that would easily cause confusion.
If your data is currently living in different places, don’t fret; with all the recent developments in data warehouse solutions, only very few engineering resources are required upfront for you to migrate all your data into a data warehouse.
5. Have good documentation for the databases and tables and have a dedicated group (Slack or other channels) to answer data-related questions
Considering the level of complexity of most companies’ data, it’s no surprise that nobody has the full knowledge of everything that lives in the databases. Educating every data user about every column in every table is virtually impossible.
So it’s important for every table to owner/creator to properly document the tables at creation so future table users don’t have to wreck their brains trying to figure out whether the unit of the “Weight” column of the table is in “lb” or “kg”. It’s also helpful to have a dedicated group of people to answer questions like the one above when something is not well documented or hard to understand about the tables; imagine a customer service department for your data sources.
6. Start linking queries (or no code option sources) to numbers
Imagine you have seen the number for sales of last year on two different reports; one states the sales was 10 million, the other 9 million. What’s causing the discrepancy here? Is it because one of them is sales including tax while the other not? Or is it because one is using the company’s fiscal year and the other calendar year? You can come up with all sorts of different hypotheses for this discrepancy but checking them would be a huge pain if you don’t know how exactly those numbers were pulled.
That’s why it’s so important to link sources for all the numbers and graphs on reports and presentations. Doesn’t matter if it’s SQL queries you used or Looker/Tableau dashboard links, those linked sources will make the lives easier for whoever is looking to reproduce the report or simply is curious how to find the numbers.
Another benefit of linking sources is taking advantage of the reusability of queries/dashboards. Imagine someone from another department is trying to find out the sales number for the past 10 years; instead of scheduling a meeting with you to ask about how you pulled the numbers, they can simply use your linked query and change a few parameters. Now you get 15 minutes back on the calendar that you can use to grab a coffee.
With all the steps mentioned above, hopefully, questions like “What was our revenue last year?” will no longer show up in anyone’s inbox.