Thursday, December 22, 2011

The Materiality of Virtuality: Internet Reporting on Arab Revolutions

Feminists and women of color for years have been making the intervention that materiality matters. Even in the digital world, the idea that data should be abstracted and free-floating serves to reinforce the authority of privileged, white male practices. For literary critic Katherine Hayles, "[This] leads to a strategic definition of 'virtuality.' Virtuality is the cultural perception that material objects are interpenetrated by information patterns. The definition plays off the duality at the heart of the condition of virtuality—materiality on the one hand, information on the other."[1] Virtuality as such is a bridge between the material world and a purely unstructured world of data.

Hayles’ argument is made visible in this example photograph remixed from today’s Guardian article, “Image of unknown woman beaten by Egypt's military echoes around world” by Ahdaf Soueif. In this collection of images, I am emphasizing the material presence of the political, lived experience of gender and identity through this remixed, virtual representation. The power in this photograph is its ability to communicate a physical brutality on the ground occurring in the material present.

[Fig. 1. A graphic remix, Egyptian Body Remixed - virtual materiality, to illustrate the concept of the materiality of virtuality.]

Virtuality is often opposed to the real in discussions of cyberspace, media, and other spaces where images take precedence. However, this juxtaposition is false. The oppositional binary exists somewhere in between actual, physical existence and non-existence. It is a negotiation between materiality and information, but virtuality can only appear in the context of that negotiation. Presence is already virtual: its transference into technological realms like the Web or video is an easy relocation. It is the instantiation of presence in an actual physical body that is lost in this transfer. However, information must always be instantiated in a medium, and it is precisely the medium that is altered from physical to technological, giving the impression that something is lacking in the technologically-instantiated image. In other words, the virtual has power. It has a reality that can affect us, but only to the extent that its power is generated by belief leading to subsequent action.[2]

As twelve Arab nations undergo uprisings and revolutions, an emergent twenty-first century public has been reporting, documenting, collecting, archiving, publishing, and producing copious amounts of media on the region in digital form. In my work on these emerging online spaces, I bypass the notion of the critic as an authority who controls narrative. I do so in order to create a new role in the transnational Arab community that resonates with Web culture: to function as critic, curator, and artist all at the same time. This cyber conscious, digital art practice allows me to shift between roles in VJ Um Amel and other technoscapes occupying these subjectivities simultaneously.[3] Below is an animated example of an attempt to make material the virtual sentiment about Tahrir Square being posted on Twitter and YouTube last month. My intervention to embed a few Twitter posts on #NOSCAF and #NoMilTrials into the remix was an artistic gesture, not a truth claim.

It is critical to establish the relevance of the cyber space's virtuality—with credibility, then informed eyes can consider how to unpack the data, and work towards transforming information into insight. There is urgency about Internet archiving causing it to be forward-looking archiving. Virtual archiving does not just look back and store, but rather, it strives to unpack, encode and decipher a live stream of communication in the present.

My concern in this article is that one cannot have the analytic without a human viewpoint to shape it. The slippage happens when data, in and of itself, gets misconstrued for knowledge. In order for information to be drawn through unstructured data, there already is a layer of encoding that occurs. Thus, before we even begin to produce knowledge from data, we must first ask how is data being collected? How do we extract the data? And how do we then organize what we have extracted? It is only after its extraction and organization that data becomes intelligible information. Only then can the human researcher begin her inquiry into the nature of insights and knowledge produced so that she may take advantage of it. To demonstrate the point being advanced, below is a diagram adapted from the work of design theorist Nathan Shedroff. The diagram illustrates this flow from data to insight. Each layer represents another layer of encoding, deciphering, and subjectivity.

[Fig. 2. This graph was created using Illustrator, and the cartoon is courtesy of Carlos Latuff.]

Because over sixty percent of Twitter posts on the Middle East are mainly[4] in Arabic, it makes sense that any sentiment or semantic analysis would be conducted in Arabic. However, this is not the case. The few tools built to process big quantities of data are not compatible with Arabic. This is one of the reasons why many of the data visualizations on the 2011 Arab uprisings and revolutions are based on traditional content analysis of quantifiable items, rather than semantic interpretations of what these items mean.

In the below example, I worked with a team of developers from IBM to modify their Natural Language Processing[5] tools to work in Arabic. The bubble chart is meant to graph public sentiment on potential presidential candidates—green signifies “positive” sentiment and red signifies “negative” sentiment. Natural language processing techniques allowed us to quantify from date generated from roughly 500,000 tweets in June 2011 on #Egypt. While this graph represents one of the only Arabic data visualizations–from the user to the graph–it still is not accurate. Without paying much attention to the complicated derivatives in Arabic, I manually created lists of about twenty positive and twenty negative words in Arabic to determine what might be a positive or negative sentiment. Later in Cairo, when meeting with Algerian computer scientist Taha Zerrouki, he explained to me a list of 13,000 stop words he created for a similar project. My twenty words are simply inadequate. To complete this particular initiative, I must work with Arabic language experts who also have technical knowledge.

[Fig. 3. This research came out of my work at USC’s Annenberg Innovation Lab.]

That is not to say that data-driven journalism is not growing rapidly in the region. According to a study by Paris-based agency Semiocast, Arabic is the fastest growing language on Twitter. Arabic tweets have increased by two thousand percent in the last twelve months alone. While Western experts often focus on analyzing the datasets, there have been numerous initiatives—often volunteer-driven—by people to collect stories about their own revolutions. In Egypt, there already are several archive initiatives such as Tahrir Documents, Jan25 National Archives, Tahrir Diaries, and 18 Days in Egypt. There have also been several important open data initiatives that have been underway in the region for years. These include the Arabic FOSS network; the Social Media Exchange in Beirut; Arab Digital Expression Foundation, co-founded by journalist Ranwa Yahyia and Ali Shaath; Arab Techies, co-founded by political prisoner Alaa Abd El-Fattah and technologist Manal Hassan; and Global Voices, founded by activist Sami Ben Gharbia. The point is that technology has enabled a public-driven, real-time reporting of media. However, the speed and scale of new technological production impede the human ability to draw accurate, informed, and useful insights from what has been gathered.

In my research, I have expressed concerns over Arabic software localization–which is a means of adapting computer software to different languages and regional differences. Localization involves building spell checkers and localization tools, creating keyboard layouts and fonts, generating locales and terminologies, and localizing software and training new localizers. At an even deeper level, the shift in programming from using C++ (a highly mathematical computer language) to using Java (a computer language that grew out of network computing that contains lots of English-based vocabulary) has meant a shift into what is culturally English-based. A few years ago, these issues were quite critical. Thus, to address the need that seemed so apparent, in 2008 I created an Arabic-English archive named R-Shief, which has since developed into a digital lab that collects and analyzes Middle East content on the Internet.

Over the last fifteen months, R-Shief has collected over 188 million tweets containing selected hashtags and usernames from Twitter’s public search API, which—according to the social networking site—yields percentages in the single digits of total Twitter traffic. R-Shief began archiving tweets according to hashtags during the August 2010 Israeli attack on the Gaza flotilla because access to Twitter’s data is limited to the past “6-9 days of Tweets.” More than a year later, R-Shief’s hashtag selection process has evolved from manually choosing which hashtags to archive to algorithmically choosing, a method[6] achieved by a cloud of computers processing over eight hundred hashtag feeds by the second.

In addition, the site archives forty-three Facebook public pages, twenty-six blogs and forums, and twenty-eight websites, including 789 articles with over 5,000 comments from Yet, with all the petabytes[7] of information stored and analyzed by the lab, R-Shief’s repository is only a fraction of information produced on the net. Simply put, this rate of production—in terms of both speed and scale—exceeds the human capacity for its complete consumption. It has become an issue of seeking solutions over seeking truths.

On many levels, the information age is one of virtual scale and real-time speed of information. Thus the rate and scale of virtual productions means that technological interpretations are inevitable. In his book, Speed and Politics, Paul Virilio asks the question: If the world is run by the engine of capitalism, then why is it that its continuing acceleration has not stopped at the limit of the realization of capital? His answer is that it is because what drives our technocratic society is not capitalism but militarism, the dromological state, the state of movement.[8]

Today, the irony is that in the midst of Arabic cries for justice, democracy, dignity, and freedom, Western interpretation of social media data on the Arab uprisings are privileged and continue to predominate. In addition, the material experience of new media, virtuality, is easily misunderstood. It remains tricky to explain because virtuality both exists and does not exist in the material world. Thus, shaping or finding usable information from this mass of content, in my opinion, require innovative approaches. As I mentioned, we remain challenged with the localization of Arabic language tools freely available in open source. And the slippage of misconstruing data for knowledge perhaps is the most dangerous of all, especially when these believable truths are cause for action.

The concerns and challenges outlined in this article have led me to an ethical imperative to seek a forum where interlocutors collaborate with informed experts from various fields on the Middle East–economics, political science, art–on research projects using quantifiable, large data analysis. In such a forum, analysts can write a brief description on how they think quantitative work might best be formulated for this discipline. Ultimately, the central questions are, “How can we gain insight from the virtual world” and, “What is the role of ethics in new models of communication and social interaction?” In a post-Arab Revolution, post-Occupy Wall Street, post-Twitter world, where are our value economies?

N. Katherine Hayles, How We Became Posthuman: Virtual Bodies in Cybernetics, Literature, and Informatics (Chicago: U of Chicago P, 1999) 3. According to her definition, the posthuman view “configures the human being so that it can be seamlessly articulated with intelligent machines. In the posthuman, there are no essential differences or absolute demarcations between bodily existence and computer simulation, cybernetic mechanism and biological organism, robot teleology and human goals.”

[2] Religious beliefs and rituals are perhaps the most pervasive examples of engagement with the virtual. Religious altars explicitly engage with the virtual as a reality that cannot be seen or that cannot be proved except in the belief in its existence and with its engagement.

[3] Originally written in "A VJ Manifesto," by VJ Um Amel (April 30, 2011).

[4] According to R-Shief Labs, this is true of most non-Francophone Arab nations, in particular, of Egypt and Syria. Tunisian and Moroccan tweets are predominantly in French, while, interestingly, tweets that include “#Libya” are mainly in English.

[5] Natural Language Processing is a field of computer science and linguistics. It is a type of computational method of extracting meaning from unstructured data. Examples include word and sentence tokenization, text classification and sentiment analysis, spelling correction, information extraction, parsing, meaning extraction, and question answering.

[6] Swarm computing, also known as “swarm intelligence” is a method of collective behavior ofdecentralized, self-organized systems, natural or artificial. The concept is employed in work on artificial intelligence

[7] A pedabyte is s a unit of information equal to one quadrillion (short scale) bytes, or 1000 terabytes.

[8] Virilio, Paul. Speed and Politics. p. 26, 1968.