Google is extending its reach ‘upwards’ and ‘backwards’. The ‘upwards’ direction is into ‘the cloud’, as it encourages us to store our data and software offsite on its remote servers, instead of in our homes and offices. The ‘backwards’ direction is into cultural history, as it becomes the market leader in scanning, storing and analysing the trillions of words that have been written since the dawn of history.

"The Cloud"

There are some wonderful possibilities and one or two dangers in all this. Charles Arthur reports on some of the dangers.

Google’s new cloud computing ChromeOS looks like a plan “to push people into careless computing” by forcing them to store their data in the cloud rather than on machines directly under their control, warns Richard Stallman, founder of the Free Software Foundation and creator of the operating system GNU.

Two years ago Stallman, a computing veteran who is a strong advocate of free software via his Free Software Foundation, warned that making extensive use of cloud computing was “worse than stupidity” because it meant a loss of control of data.

Now he says he is increasingly concerned about the release by Google of its ChromeOS operating system, which is based on GNU/Linux and designed to store the minimum possible data locally. Instead it relies on a data connection to link to Google’s “cloud” of servers, which are at unknown locations, to store documents and other information.

The risks include loss of legal rights to data if it is stored on a company’s machine’s rather than your own, Stallman points out: “In the US, you even lose legal rights if you store your data in a company’s machines instead of your own. The police need to present you with a search warrant to get your data from you; but if they are stored in a company’s server, the police can get it without showing you anything. They may not even have to give the company a search warrant.”

“I think that marketers like “cloud computing” because it is devoid of substantive meaning. The term’s meaning is not substance, it’s an attitude: ‘Let any Tom, Dick and Harry hold your data, let any Tom, Dick and Harry do your computing for you (and control it).’ Perhaps the term ‘careless computing’ would suit it better.”

He sees a creeping problem: “I suppose many people will continue moving towards careless computing, because there’s a sucker born every minute. The US government may try to encourage people to place their data where the US government can seize it without showing them a search warrant, rather than in their own property. However, as long as enough of us continue keeping our data under our own control, we can still do so. And we had better do so, or the option may disappear.”

It might sound a bit paranoid, but remember that Amazon recently removed Wikileaks from its cloud computing on the grounds that they had breached its terms and conditions.

Alok Jha, more positively, writes about the new science of ‘culturomics’ that has emerged to analyse the vast databases of newly scanned literature.

How many words in the English language never make it into dictionaries? How has the nature of fame changed in the past 200 years? How do scientists and actors compare in their impact on popular culture?

These are just some of the questions that researchers and members of the public can now answer using a new online tool developed by Google with the help of scientists at Harvard University. The massive searchable database is being hailed as the key to a new era of research in the humanities, linguistics and social sciences that has been dubbed “culturomics”.

The database comprises more than 5m books – both fiction and non-fiction – published between 1800 and 2000, representing around 4% of all the books ever printed. Dr Jean-Baptiste Michel and Dr Erez Lieberman Aiden of Harvard University have developed the search tool, which they say will give researchers the ability to quantify a huge range of cultural trends in history.

“Interest in computational approaches to the humanities and social sciences dates back to the 1950s,” said Michel, a psychologist in Harvard’s Program for Evolutionary Dynamics. “But attempts to introduce quantitative methods into the study of culture have been hampered by the lack of suitable data. We now have a massive dataset, available through an interface that is user-friendly and freely available to anyone.”

What can it actually do? Just take two examples. First, an analysis of the changing nature of fame over the last two centuries.

By looking at the frequency of famous people’s names in literature, they showed that celebrities born in the mid-20th century tended to be younger and more famous than those of the 19th century, but their fame lasted for a shorter period of time. By 1950, celebrities were achieving fame, on average, when they were 29, compared with 43 for celebrities around 1800. “People are getting more famous than ever before,” wrote the researchers, “but are being forgotten more rapidly than ever.”

Another example: tracking censorship across different cultures.

The database can also identify patterns of censorship in the literature of individual countries. The Jewish artist Marc Chagall, for example, was mentioned only once in the entire German literature from 1936 to 1944, even though his appearance in English-language books grew around fivefold in the same period. There is also evidence of censorship in Chinese literature when it comes to Tiananmen Square and in Russian books with regard to Leon Trotsky.

Google will know us better than we know ourselves. Although that is probably already true just from its analysis of what we have searched for over the last few months.

