Who is Afraid of the Big Bad Data?
by Brendan Reid (Associate Lecturer MA Graphic Arts)
Metaphor allows people to understand one thing in terms of another, without thinking that the two are objectively the same.[i]
Once upon a time there was a magic box, a digital tardis that could take us anywhere. Soon the message spread and the Internet set sail. This is how all new technologies have been viewed from the telephone to the x-ray machine, a metaphorical magic box. The Internet, only twenty-four years old, has created a whole new set of metaphors for itself. The magic box allows us to travel without leaving the helm of our armchair, remember the Microsoft strap line, ‘your journey starts here.’[ii] From its earliest inception the most common way to take that journey was by ocean, the metaphor of sea travel has been central to the Internet. Think Netscape with its ship’s steering wheel logo, surfing, data streaming, phishing and piracy. We are all captains of our own ship drowning in a sea of information. Not only do we command the digital seas, we can also design and consume the objects within it. Need a smoking jacket for your dog; would you like some unicorn milk in your Frappuccino;[iii] or want your signature turned into a 3D vase;[iv] and why not? Anything else would be an infringement of your human rights. You are the captain after all and the world is your oyster. These sites really do exist, except for the unicorn milk. The holy grail of advertising has been the concept of mass customisation,[v] the ability to personalise the customer experience. This marketing dream is now a reality. As Brad Peters explains in Forbes:
“The extraordinary richness of modern life, especially as it has reached out to include 3 billion of the world’s people, can be largely credited to the mass customisation revolution. But now, Big Data… promises to take this relationship to the next level: mass personalization.”[vi]
But what is big data? Now let us take a leap of the imagination away from reality. What if we are not captains, but fish caught in government and corporate nets? These two areas have hit the headlines recently. It has become a hot topic in 2013. On the 6th June 2013, the Guardian newspaper started the ball rolling by releasing a series of articles from its security and privacy journalist Glenn Greenwald, passed on to him by the previously undisclosed former National Security Agency (NSA) insider Edward Snowden. These revealed over a series of weeks, how the American NSA and some European allies such as GCHQ were collecting data from social media sites and mobile devices to store on servers for future analysis. It has been claimed that Snowden exposed alleged mass surveillance activities of the US government. They are accused of using two key classified intelligence programmes, Prism[vii] and a data-mining tool, called Boundless Informant,[viii] to spy on citizens. Why did it happen at this moment in time? The ability to gather and store information has been with us since the inception of the internet, but “we reached a tipping point, where the value of having user data rose beyond the cost of storing it,” said Dan Auerbach, a technology analyst with the Electronic Frontier Foundation, an electronic privacy group in San Francisco, “Now we have an incentive to keep it forever.”[ix] Social media sites in the meantime are growing as voluntary data-mining operations, on a scale that rivals or exceeds anything the government could attempt on its own. “You willingly hand over data to Facebook that you would never give voluntarily to the government,” said Bruce Schneier, the author and technologist.[x] Data now streams from daily life: from phones and credit cards and televisions and computers; from the infrastructure of cities; from sensor-equipped buildings, trains, buses, planes, bridges, and factories. The data flows so fast that the total accumulation of the past two years, a zettabyte, dwarfs the prior record of human civilization. “There is a big data revolution,” says Weatherhead University Professor Gary King. The ability to store is one thing, but how do you use it? The quantity of data is not revolutionary. “The big data revolution is that now we can do something with the data.” The revolution lies in improved statistical and computational methods, not in the exponential growth of storage or even computational capacity. King explains:
‘The doubling of computing power every 18 months (Moore’s Law) “is nothing compared to a big algorithm,” a set of rules that can be used to solve a problem a thousand times faster than conventional computational methods could.’[xi]
One colleague, faced with a mountain of data, figured out that he would need a $2 million computer to analyse it. Instead, King and his graduate students came up with an algorithm within two hours that would do the same thing in 20 minutes on a laptop. King himself says big data’s potential benefits to society go far beyond what has been accomplished so far. Google has analysed clusters of search terms by region in the United States to predict flu outbreaks faster than was possible using hospital admission records. “That was a nice demonstration project,” says King, “but it is a tiny fraction of what could be done” if it were possible for academic researchers to access the information held by companies. (Businesses now possess more social science data than academics do, he notes, a shift from the recent past, when just the opposite was true). If social scientists could use that material, he says, “We could solve all kinds of problems.”
King’s comments point to another reality of big data, not fairy dust, but gold dust. Why share when there is money to be made? Recently, on a CNBC Squawk Box[xii] interview, “The Pulse of Silicon Valley,” host Joe Kernan posed the question to Ann Winblad, senior partner at Hummer-Winblad, “What is the next really big thing?” Her response: “Data is the new oil.” Winblad talked about predictive analytics[xiii] as the new hotspot for venture investing and discussed the growth of companies that can derive value from the huge amounts of data being stored. This was not the first time we heard the phrase “data is the new oil”. For example marketing commentator Michael Palmer blogged back in 2006:
‘Data is just like crude. It’s valuable, but if unrefined it cannot really be used. It has to be changed into gas, plastic, chemicals, etc., to create a valuable entity that drives profitable activity; so must data be broken down, and analysed for it to have value.’[xiv]
The simple act of browsing the Internet or subscribing to magazines catches your personal information in a complex net of buyers, sellers and brokers. Dubbed “behavioural advertising,” advocates say they can eliminate unwanted annoying ads by collecting information about your interests. Monitoring begins with the cookie, a small text file advertisers save on your computer. It is retrieved later and compiled with other cookies to develop a complex portrait of your online behaviour. Data flows from social networks and Internet companies to data brokers. They combine this with other data to create lists of data on specific groups with names like ‘Pensioners with Pension Funds’ and ‘Tropical Beach Resort Goers.’ These lists are bought, sold, exchanged and bartered, forming the basis of the big data economy. This collecting, analysing and trading is then shipped to the data users, mainly advertising companies, but can include fundraisers or non-profits organisations. They buy or rent lists to better understand their target demographic based on specific traits such as: ethnicity, income, property value, and hobbies. Data users do not target you specifically. Instead, they will analyse a specific list to build profiles. Marketers can net a target area with advertising or open a new store in an up-and-coming area. Whilst it is easy to blame big brother we also have a personal responsibility. A survey carried out by Microsoft ad marketing agency found 36% of UK citizens polled, and a staggering 45% of Canadian citizens said they would be willing to give up personal data for a price. Recently Internet pioneer Sir Tim Berners-Lee has said his invention of the World Wide Web should be safeguarded, and has accused Western nations of hypocrisy over web spying. Asked to comment on Edward Snowden’s revelations, Sir Tim Berners-Lee said:
‘The original design of the web of 24 years ago was for a universal space, we didn’t have a particular computer in mind or browser, or language. When you make something universal…it can be used for good things or nasty things…we just have to make sure it’s not undercut by any large companies or governments trying to use it and get total control.’[xv]
Given what we know about government and corporate uses of the Internet maybe even Sir Tim has missed the boat. Holidays tend to make us relaxed and let our guard down. So remember the next time you set sail on the Internet do not forget your lifejacket.
[i] Sweetser, Eve E. (1990): From Etymology to Pragmatics: Metaphorical and Cultural Aspects of Semantic Structure, Cambridge Studies in Linguistics #54, Cambridge, Cambridge University Press.
[ii] Make a search for any Microsoft product from Word, the XBox to Cloud computing and the illustration of journey appears.
[iii] Starbucks does this with frappuccino.com, where the company lets users build their own virtual Frappuccino, with ingredients such as raspberry flavouring and protein powder. This allows Starbucks to measure the popularity of different ingredients as well as popular combinations, such as caramel and whipped cream, before investing in any actual process or ingredient changes in its stores.
[iv] Shapeways.com Design, Prototype, Buy and Sell custom products at Shapeways; The world’s largest online 3D Printing Community.
[v] Mass customisation is a production process that combines elements of mass production with those of bespoke tailoring. Products are adapted to meet a customer’s individual needs, so no two items are the same.
[vii] First reported by the Guardian newspaper on Friday 7 June 2013, the article stated; “The National Security Agency has obtained direct access to the systems of Google, Facebook, Apple and other US internet giants, according to a top secret document obtained by the Guardian. The access is part of a previously undisclosed program called Prism, which allows officials to collect material including search history, the content of emails, file transfers and live chats, the document says.”
[viii] First reported by the Guardian newspaper on Tuesday the 11 June 2013, the article stated; “The Guardian has acquired top-secret documents about the NSA data mining tool, called Boundless Informant, that details and even maps by country the voluminous amount of information it collects from computer and telephone networks.”
[xii] Squawk Box is a business news television program that airs at breakfast time on the CNBC network. The program is currently co-hosted by Joe Kernen.
[xiii] Predictive analytics encompasses a variety of statistical techniques from modelling, machine learning, and data mining that analyze current and historical facts