• womenunbounded

Big (Bad) Data: Hidden Gender Biases

Written By: Vrinda Sood and Abhinaya Mathivanan


Edited By: Nicole Anne Hia and Brenda Tan


Blog Cover Designed By: Catharina

We live in a world that is closely intertwined with technology and heavily driven by big data. It is all around us as we speak. From wearables (i.e. Apple Watch or Fitbit) that are capable of monitoring health conditions to personalized marketing, these technological achievements are made possible by amassing and analysing great amounts of data (i.e. big data). While these often lead to inventions that improve our quality of life, it would be naive of us to ignore the repercussions they bring as well.


In particular, data and tech have made it easier for existing inequalities to morph and mutate into bigger, more dangerous monsters. The question, then, is - how did we get here?

Let’s start from the very beginning.

Most data analysis is carried out by gathering information and identifying patterns to establish what is normal. But, what exactly is normal? And who, or what, determines it?


Some might argue that it’s the consumers who decide, but ultimately, it is the tech industry - its developers, engineers and product managers - who calls the shots.

But, when the technological realm attracts and supports more men than women, it inevitably creates an innate bias that permeates every step of our process - from the moment we conceive an idea to how we collect, interpret and manipulate the data. This is troubling for both men and women as the data feeds into inequalities (e.g. patriarchal values and stereotypes) present in the systems and structures that we live in today today.

British feminist author, Caroline Criado-Perez explains a use-case of this in her book, Invisible Women. She discusses an unnamed tech company that sought to reduce the incidence of accidental falls in senior citizens, and wanted to develop an app that detected falls using the baseline of one’s phone being held close to their bodies. However, this baseline failed to take into account the fact that most women use bags or purses and hence, may not have their phones close enough to their bodies for the app to detect the risk of a fall. Despite the well-meaning intentions of the developers, it is clear that women were, and continue to be, left out in the ideation process.


The idea that there is a male-centric view around which the world is built is not simply a war cry calling out harm, but rather, a call to draw attention to the subtle, inherent biases in our lives that appear to make non-male needs invisible.

There are many reasons for scientific and technological innovations to feature men as their default. But, one problem that arises as a fundamental consequence is the unequal representation of gender.


Let’s take a minute to imagine what life is like for someone who epitomizes the dominant group in data science: a straight, white, cisgender man with formal technical credentials who lives in the United States. When he looks for a home or applies for a credit card, people are eager for his business. People smile when he holds his girlfriend’s hand in public. His body doesn’t change due to child birth or breastfeeding, so he does not need to think about workplace accommodations. He presents his social security number in jobs as a formality, but it never hinders his application from being processed or brings him unwanted attention. The ease with which he traverses the world is invisible to him because it has been designed for people just like him. He does not think about how life might be different for everyone else. In fact, it is difficult for him to imagine that at all.


The fact that the STEM industry is largely dominated by men leads to a privilege hazard: male product and service developers are not always able to see beyond their own experiences to consider the perspectives and issues faced by users of other genders.


Even outside the hallowed halls of academia, studies on gender have shown that from written languages to the environment, gender bias is heavily ingrained in our lives. This has even seeped into textual and visual datasets. When information becomes data in the name of Science, otherwise debatable information is converted into a solid basis for making subsequent claims, thereby creating continuous cascading effects in women’s lives.


Take hiring platform GLID as an example. It combs through social data when looking at a candidate’s profile - the traces that potential candidates leave behind on the websites that they visit may have nothing to do with the job they are applying for - but are taken into account as they are ranked on the basis of the “social capital” that the candidate may have within digital communities. It includes looking at their GitHub, a digital community of programmers to share their code and grow together. While this seems like an effective, albeit not completely accurate, measure of a candidate’s ability to perform and work in a team, the platform also takes into account other digital footprints as credible data. “Seeing that someone frequenting a particular Japanese manga site may be a solid predictor of strong coding”, says Cathy O’Neill, who explains this example in her book, Weapons of Math Destruction.


While this assessment of the candidate’s “social capital” could arguably be a valid indicator, it doesn’t take women, or gender, into account. Women may be less likely to engage in such typically male-dominated communities due to the prevalence of ‘sexist speech’, for example. In addition, since this algorithm is trained using existing data on programmers who are mostly male, it makes inaccurate gender-specific correlations that deter women from entering this field.


Another example of the extent of the effects of bias in the categorisation of data can be seen in what has been touted as a cornerstone development in organising human knowledge: the Dewey Decimal System (i.e. the widely used method to classify and organize books in the library).


When Melville Dewey came up with the system in 1876, society’s norms largely favoured a certain kind of member - one who was likely to be male and a Christian. As a result, the subjects of study most favoured by the categorisations made by the system reflected this as well - with the number of categories being dedicated to the study of Christian and European culture and history far outnumbering those dedicated to other cultures and orientations. Holly Tomren, who authored the paper Classification, Bias, and American Indian Materials remarked: “In terms of library services, this marginalization negatively impacts the ability of users to successfully retrieve information on these topics. On a larger scale, biased classification systems and subject headings reinforce and perpetuate negative stereotypes in our society."


So, how can we rewrite these mistakes of the past that have been responsible for much of our functioning as a society today?


An oft-cited solution to this issue is tackling insufficient representation. Adding more women to the workforce will make the source of the data that goes into creating tools for societal development more representative of actual society.


However, when there is already an entrenched impediment to the hiring of women in the workforce; when women in workplaces do not wish to be identified by their gender; or when women face resistance in voicing their needs or sharing gendered differences, the issue still persists.


There are some solutions that address the even more fundamental concepts that come before data creation. One such example is community involvement and education. A women-led high school program called Local Lotto did not just view data science as only abstract and technical, but modelled data science that is grounded in solving ethical questions around social inequality, and that had relevance for learners’ everyday lives. The project valued their learners’ lived experiences: the learners came in as “domain experts” of their neighborhoods. And, it valued both qualitative and quantitative data: the learners spoke with residents in their neighborhoods and connected their beliefs, attitudes, and concerns to probability calculations.


Questioning biases at the time of data collection might seem - as a concept - ironclad, until we consider that in a world economy driven by data, these contribute to the emergence of new power dynamics. Who does this data actually serve? Data collection, storage and management is a capital-intensive process that only certain organisations have the power to hold: universities, governments and corporations wield great power in determining what is important, and how resources are allocated to obtain them.


If it is to serve their own objectives, it begs the question again of whether ‘data and numbers’ are even truly objective to begin with.

Perhaps, then, the first step should be to change the way we establish the beneficiaries of data collection and usage. And, even though this may not be a complete solution, at least it may be a step in the right direction.