Mind The (African and Gender) Gaps
Wiki Loves Women is a multi-country, multi-faceted project that aims at encouraging the contribution of content that celebrates the influence of women leaders, and reflects the realities faced by women and girls across Africa. It is designed to help bridging two gaps on Wikipedia and other Wikimedia project : Women and Africa.
The Gender Gap
The gender gap has long been known to exist in computer-related occupations, and indeed the Wikimedia community was aware of this issue from the very early days. The first large-scale publication that attempted to quantify the gap was a survey conducted by the United Nations University and published in March 2010. The study of Wikipedia’s contributor base showed that it has barely 13 percent women.
Since then, the Gender Gap has been further documented and many initiatives set up to counter it. The gap is complex as it is expressed at several levels.
For example, coverage bias determines differences between the number of notable women and men portrayed on Wikipedia. Structural bias quantifies gender-specific tendencies to preferably link articles of notable people with the same or different gender. Lexical bias reveals inequalities in the words used to describe notable men and women on Wikipedia. Visibility bias reflects how many articles about men or women make it to the front page of Wikipedia.
LESS THAN 20% OF (ALL) WIKIPEDIA CONTRIBUTORS ARE FEMALE
Numerous studies show that contributors to Wikipedia are more likely to be males than females (percentage reported varies between 7 to 25% depending on languages editions and depending on studies). A good summary of studies and surveys made on the topic may be found on the blog of Wikimedia Foundation (April 2015).
ONLY 16% OF THE BIOGRAPHIES ON THE ENGLISH WIKIPEDIA ARE ABOUT WOMEN
The WikiProject Women in Red is a Wikipedia project whose objective is to turn “redlinks” into blue ones within the project scope (=create missing biographies). The project scope includes women – real and fictional – their biographies and their works, broadly construed. The Women in Red argue that the “Content gender gap” is a form of systemic bias, and they want to address it in a positive way. They do this by hosting edit-a-thons on various topics, and socializing the scope and objective via social media.
IN THE ENGLISH WIKIPEDIA AN ARTICLE ABOUT A NOTABLE PERSON THAT MENTIONS THAT THE PERSON IS DIVORCED IS 4.4 TIMES MORE LIKELY TO BE ABOUT A WOMAN RATHER THAN A MAN
A study established topical and linguistic bias in the way articles about men and women are categorised. While it is well known that topical and linguistic biases exist, it was unknown to what extent these biases manifest in Wikipedia. To answer these question they compared the overview (the lead section) of biographies about men and women in the English Wikipedia.
According to Wikipedia, the “lead section” should stand on its own as a concise overview of the article’s topic. It should define the topic, establish context, explain why the topic is notable, and summarize the most important points.
They focused on the lead section for two reasons. On one hand, the first part of the article is potentially read by most people who look at the article. On the other hand, Wikipedia editors need to focus on what they consider most important about the person, and biases may drive this selection process.
To unveil topical biases that manifest in Wikipedia content, they analysed the following three topics :
- The gender topic contains words that emphasise that someone is a man or woman (i.e., man, women, mrs, mrs, lady, gentleman) as well as sexual identity (e. g., gay, lesbian).
- The relationship topic consists of words about romantic relationships (e.g., married, divorced, couple, husband, wife).
- The family topic aggregates words about family relations (e.g., kids, children, mother, grand- mother).
Results clearly showed that across all language editions studied, almost all words that fall into the category Family, Relationship or Gender, reveal a high likelihood ratio for women. In the English Wikipedia an article about a notable person that mentions that the person is divorced is 4.4 times more likely to be about a woman rather than a man. Similar results were observed in all six language editions.
A lexical bias is indeed present on Wikipedia and can be observed consistently across different language editions (Source).
The Africa Gap
The African gap is somewhat easier to establish at the content level, but complex at the contributing level. In any cases, it has been very little documented in academic research.
Regardless, a few figures are interesting to keep in mind…
MORE EDITS ORIGINATE FROM HONG KONG EACH QUARTER THAN DO FROM THE ENTIRE CONTINENT OF AFRICA OVER THE SAME PERIOD
Much of this figure can actually be explained by Internet population (i.e. the total number of Internet users in a country). However, even accounting for their generally low Internet populations, most countries in Sub-Saharan Africa still fall below their expected number of edits (source).
THERE ARE, ON AVERAGE, 100 TIMES MORE GEOTAGGED ARTICLES ON FRANCE THAN IN AFRICA
A geotagged article is one attached to geographical coordinate. Typically a building. According to Graham and Foster (source) :
The visual above shows the absolute number of within-region-edits (both anonymous and registered) to geolocated articles in the English-language Wikipedia, by world region. In terms of raw numbers, North America and Europe drastically outnumber the remaining world regions. Conversely, Latin America, MENA, and Sub-Saharan Africa all commit only a very small number of within-region-edits. What this means is that even a relatively small number of edits flowing into those regions from outside (i.e. allochthonous contributions) could easily drown out local voice from places like Sub-Saharan Africa: something we undoubtedly see happening.
ONLY 25% OF EDITS TO SUBJECTS ABOUT THE SUB-SAHARAN REGION COME FROM WITHIN THAT REGION
A research project investigated the topic by looking at the proportion of edits made from within a region to Wikipedia articles. On the vertical axis of the figure we can see a clear division between regions that are largely able to define themselves and regions that are largely defined by others.
According to Graham, Sub-Saharan Africa, Middle East & North Africa, Latin America & Caribbean receive comparatively few edits from within their territories (around 25 percent). Europe, Oceania and North America on the other hand receive primarily edits from within (around 75 percent). Asia is edited from within and from outside to almost equal degrees. In other words, there are significant parts of the world in which a majority of content is not locally generated.
More details per country may be found here. It actually suggest that Sub-Saharan Africa figures are boosted by by relatively high numbers for South Africa, Uganda, Mauritius, Rwanda, and Zimbabwe. Sub-Saharan Africa’s figures would be even lower without the effects of those five countries.
Also, Sub-Saharan Africa have 67% of its commited edits staying within its own region. Even when editors from Sub-Saharan Africa spend most of their edits within region, their small numbers mean that most content still comes from elsewhere.
Exploration of biographies tend to show that the gap is much larger when it comes to biographies for people born in the 19th century till mid 20th century.
Where the gaps create an abyss
ONLY 12% OF BIOGRAPHIES IN SUB-SAHARAN AFRICA ARE ABOUT WOMEN
WIGI is a project producing a open data set about the gender, date of birth, place of birth, ethnicity, occupation, and language of biography articles in all Wikipedias. In the case of study of gendered biographies of men and women in Wikidata, by Culture, culture is determined by using translating ethnic group, place of birth, and citizenship into 1 of 9 world cultures as per Inglehart-Welzel map of the world with Mechanical Turk. It more or less map sub-saharan Africa.
44% of WIKIDATA-RECORDED BIOGRAPHIES OF WOMEN IN CAMEROON ARE ABOUT SPORT
Records in WikiData and biographies in Wikipedia poorly reflect the diversity of notable women occupation.
Largest coverage of content is related to sport, followed by politician occupation, then actress/singer occupation. This bias may be due to an unbalance of interest from editors, the availability of (or lack there-of) public sources for other professional occupations, possibly an easier approach to write on those occupations than on others more touchy ones (such as “writer” or “businesswoman”), and the interest for wikimedians to organize edit-a-thons (co-writing events) or photo-hunts in relation to major sport events, movie public events, or political elections.
WikiData is an interesting way to explore records of occupations of notable people and to compare the records available for women versus men, in particular in Africa. Several visualizations were also made available to better evaluate and understand the current gap during a WikiData workshop organized at SUPSI early February 2017.
Interesting links to explore