Data Innovation 101
Understanding the Jargon
(Excerpt from the Centre for Data Innovation‘s “Data Innovation 101”)The topic of data-driven innovation is so new that many terms are poorly understood. These include “big data,” “open data,” “data science,” and “cloud computing.”
“Big data” refers to data that cannot be processed using traditional database systems, either due to the relative size and heterogeneity of the data set, or the speed at which it is updated. Big data has existed for decades in fields such as astronomy and atmospheric science, but the growth of digital data collection has rapidly brought the topic into many other fields. In some areas, big data technologies have allowed researchers to analyze entire populations, without having to rely on samples; in addition to enabling faster analysis, this has also resulted in increased model accuracy.
“Open data” refers to data that is made freely available without restrictions. Such data is most useful when it is released in non-proprietary, machine-readable formats. Open data can be used to drive innovation within and beyond the organization that created it because it allows other organizations to make use of it. A 2013 McKinsey Global Institute report estimated that open data could add over $3 trillion in total value annually to the education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance sectors worldwide.
“Data science” refers to the range of technologies, as well as statistical and computational techniques, that are used to analyze and derive insights from data. The term “data science” does not exclusively refer to these techniques and technologies as they are applied to big data; it also encompasses data analysis that is conducted using smaller data sets and traditional database systems.
“Cloud computing” is the practice of “renting” remotely-located IT services, including processing capabilities, information storage, and software applications, on an as-needed basis. Cloud computing turns fixed costs into variable costs, and it allows organizations to scale their computing resources to meet demand.
The Benefits of Data
Data leads to better understanding and decision making among individuals, businesses, and government.
Individuals use data to make better decisions about everything from what they buy to how they plan for the future. These decisions can be minor, such as deciding whether to carry an umbrella based on weather forecasts, or major, such as deciding where to go to college based on school evaluations or predictions of future career earnings. Traffic data helps individuals find the most efficient route from point A to point B, saving time and gas in the process. Data from the electricity grid can help homeowners save on utility bills. User reviews on sites like Amazon help consumers discover the products that they like best. Yelp’s restaurant reviews help people decide where to enjoy their next meal, and (since the site has recently begun to integrate additional data from city health inspections) to factor food safety into these decisions.
Businesses use data to find new customers, automate processes, and inform business decisions. For example, Visa’s data-driven Advanced Authorization service alerts banks to potential fraudulent transactions in real-time, identifying as much as $1.5 billion in fraud around the world annually. Coca Cola uses complex models to ensure that every batch of orange juice it blends tastes consistently fresh. Intel uses predictive modeling on data from its chip manufacturing plants to anticipate failures, prioritize inspections and cut monitoring costs. GlaxoSmithKline conducts text analytics on data collected from online forums so that it can better understand and respond to the concerns of parents who delay vaccinating their children. Wind energy companies, such as Vestas, use complex weather models to determine the optimal locations for their turbines.
Government agencies use data to cut costs, prioritize social services, and keep citizens safe. The U.S. Securities and Exchange Commission analyzes data reported by publicly-traded companies to identify suspicious filings and inform fraud investigations. The European Space Agency deploys satellites equipped with remote sensing technologies to track and analyze changes in the global environment and help forecast weather events, such as hurricanes and droughts. The U.S. Centers for Disease Control and Prevention uses social network analysis to better understand and stem the spread of communicable diseases.The UK’s Royal Mail uses analytic software to determine the most efficient delivery routes and make sure parcels get to their destinations as quickly as possible.The U.S. Institute of Educational Sciences conducts randomized trials, inspired by clinical research, to collect data and measure the impact on learning outcomes of certain educational variables, such as choice of instructional materials.New York City’s Fire Department prioritizes inspections based on risk assessments derived from building data which has resulted in the city reducing the number of annual fire deaths to the lowest since recordkeeping began in 1916.
Read more on the Centre for Data Innovation’s website.