"Rohdaten ist ein Oxymoron" (BD2015)
Kategorien sind unentbehrlich
Exzerpt aus Geoffrey C. Bowker: The Theory/Data Thing. International Journal of Communication 8 (2043), 1795–1799
So a two-part question: do we need theories, and do theories need categories?
In The Fragile Absolute: Or, Why Is the Christian Legacy Worth Fighting For? Žižek (2009) provides one way in to these questions. Take the social dimension first. If we accept the underlying ontology that we are all individuals (atoms) who aggregate in unnamed clusters rather than categories, then Žižek argues that we certainly lose the ability to recognize constant and meaningful forces in “society” (which I’ll put in scare quotes for the nonce).
It does not just happen that there is a net protein, natural resource drain from the Third World to the First, nor that women in the United States are consistently paid less for the same quality of work as men. These categories represent a reality. Certainly, they should not be essentialized. The Third World/First World divide overlooks regions of intense underdevelopment in, say, the United States and regions of vast wealth in, say, India.
Similarly, “woman” is a category that can and should be questioned. And yet . . . the rough, aggregate truth is that there is not a level playing field for either, broadly construed. No data deluge will explain these truths — at best, it can help direct policies to mitigate the injustice; at worst (and most commonly), it can deny that there are indeed broad social forces. Willy-nilly, our social world is one in which categories have deep meaning.
This is not just about the social truths: The same can be argued for truths in the natural sciences. A category system like the species concept is indeed highly problematic (Wilkins, 2011); however, the aggregate behavior of most entities can be described along certain dimensions as if this categorization were real. In both cases, the world is structured in such a way as to make the categories have real consequences.
So in some ways, categories are central to being in the world. Big data does not do away with categories at all. As I have argued elsewhere, the term “raw data” is itself an oxymoron. Antonia Walford (2012) writes about the work it takes to turn data from sensors in the Amazon rain forest into manipulable data within databases. There is a plenum of data: For her, the art of the scientific database is to take this undifferentiated onslaught and conjure it into models (structured data fields, metadata) that allow Amazon data to circulate scientifically.
As Derrida (1998) argues in Archive Fever and Cory Knobel (2010) so beautifully develops with his concept of ontic occlusion, every act of admitting data into the archive is simultaneously an act of occluding other ways of being, other realities. The archive cannot in principle contain the world in small; its very finitude means that most slices of reality are not represented. The question for theory is what the forms of exclusion are and how we can generalize about them.
Take the other Amazon as an illustration. If I am defined by my clicks and purchases and so forth, I get represented largely as a person with no qualities other than “consumer with tastes.” However, creating a system that locks me into my tastes reduces me significantly. Individuals are not stable categories—things and people are not identical with themselves over time. (This is argued in formal logic in the discipline of mereology and in psychiatry by, say, ethnopsychiatry.) The unexamined term the “individual” is what structures the database and significantly excludes temporality.
Two things, then. Just because we have big data does not mean that the world acts as if there are no categories. And just because we have big (or very big, or massive) data does not mean that our databases are not theoretically structured in ways that enable certain perspectives and disable others.
Die Lehre aus "Bienenkönigen"
Exzerpte aus: Geoffrey Bowker and Paul N. Edwards: “Raw Data” Is an Oxymoron. London 2013
With a “natural” thoroughly separate from us, we can learn lessons from the book of nature and apply them deliberately to our own species. An absurdist statement of this move is given by the Natural Law Party—a mix of transcendental meditation and benign autocratic practice. A scientific guise for our times has been sociobiology—if all animals have a territorial imperative, then so must we. The sleight of hand (discussed by Bruno Latour in We Have Never Been Modern) which permits this appeal to the natural to be true is that our own understandings of nature project our views of ourselves.
Beehives in nineteenthcentury Britain had kings, because it was believed that only a male could undertake the complex tasks of government (the titular monarch Queen Victoria notwithstanding). Our knowledge professionals see selfish genes because that’s the way that we look at ourselves as social beings — if the same amount of energy had been applied to the universality of parasitism/symbiosis as has been applied to rampant individualistic analysis, we would see the natural and social worlds very differently. However, scientists tend to get inspired by and garner funding for concepts that sit “naturally” with our views of ourselves. The social, then, is other than the natural and should/must be modeled on it; and yet the natural is always already social.
Database development has followed this vein.The early databases were hierarchical— you needed to go down a detailed line of authority each time you wanted to retrieve a datum. Then we had relational databases, where there was still central control but much more flexible access (the database system, like society at the time, was seen as a fixed structure). Today we have moved into a world of object-oriented and object-relational databases, in which each data object lives in a Tardean paradise—any structure can be evanescent providing we know the inputs or outputs of any object within it.
...
Along the way, we have conceived ourselves and the natural entities in terms of data and information. We have flattened both the social and the natural into a single world so that there are no human actors and natural entities but only agents (speaking computationally) or actants (speaking semiotically) that share precisely the same features. It makes no sense in the dataverse to speak of the raw and the natural or the cooked and the social: to get into it you already need to be defined as a particular kind of monad.
There is of course an ongoing relationship with the real world and the human observer (nature and society), however it is a difficult one to express. Both the natural world and its human observers are being ever more instrumented with intelligent machines. Staggering arrays of sensors and cameras furbish “us” with terabytes of data a day about the natural world and about our social activities. The “quantified self ” movement is an oddly worshipful effort to celebrate this quantification (computers do not deal with “soft” data). The qualified self seems to be slipping out of the picture — the interpretative work is done inside the computer and read out and acted on by humans.
A dark vision is that our interaction with the world and each other is being rendered epiphenomenal to these data-program-data cycles. If it’s not in principle measurable, or is not being measured, it doesn’t exist. Thus in the natural world, we have largely as a species elected to take the quantifiable genome (https://www.23andme.com) as the measure of all life: when we save species (in seedbanks for example), we are saving irreducible genetic information — not communities (despite the fact that every individual comes with its own internal flora and fauna central to its survival; and that each individual can be understood equally as the product of a network of relationships).
Collectivities that are not being measured and modeled are preserved, if at all, only accidentally. As people we are, in Olga Kuchinskaya’s memorable phrase, becoming our own data. Mental disorders are less complexes than strings of measurable effects. By making them data, response regimes can be tested and implemented. However, this does not mean that completely different understandings of these disorders are not right — just that the complex, tight coupling between machines in the clinical and insurance industries and in administration entails that in order to survive in the world,