This is an examination of an archive of Time magazine containing 3,389 issues ranging from 1923 to 2014, focusing on images of faces.
We extracted 327,322 faces from the archive, categorized all of them by gender, and obtained detailed characteristics of a subset of 8,789 of those faces.
We use computer vision analysis, combined with contextual research and methods from the humanities, to elucidate trends and patterns in the visual culture reflected by the publication. In particular, we are examining how representations of the human face have changed over time, and seeking relationships between the visual features we discover and their corresponding socio-political contexts.
The main outcome of this project will be a meaningful and accessible web-based platform through which both researchers and the general public can explore the archives of Time magazine to discover insights into our cultural history. We expect that we will be able to apply our methodology to any periodical publication, but we chose Time because it stands as a record of the many pulses of U.S. and world politics and their intersections with American culture. We believe that because it is such a culturally important and ubiquitous publication much can be learned from these archives about how Americans perceived politics and culture throughout the twentieth and early twenty-first centuries.
Our data consists of labeled faces extracted from a Time magazine archive that contains 3,389 issues ranging from 1923 to 2012. Our entire dataset is freely available, and published in the Journal of Cultural Analytics. Please refer to the paper for details of our data.
Our published data consists of three subsets: Dataset 1) the gender labels and image characteristics for each of the 327,322 faces that were automatically-extracted from the entire Time archive, Dataset 2) a subset of 8,789 faces from a sample of 100 issues that were labeled by Amazon Mechanical Turk (AMT) workers according to ten dimensions (including gender, which was used as training data to produce Dataset 1), and Dataset 3) the raw data collected from the AMT workers before being processed to produce Dataset 2. The complete dataset can be found in the Harvard Dataverse repository .
The above chart shows the average number of faces per page in each year. The solid line is a LOWESS smoothed version of the data.
Our database of face images includes their associated metadata (year, issue, page number), as well as more detailed data: the face’s gender, race, the context in which the face appears (ad or feature story), whether or not the face is smiling, and whether it is an individual portrait or belongs to an image that contains more than one face.
Data collected for each extracted face:
Our data was first collected via human labor using a custom browser-based interface. The details of our data collection methods were published in Digital Humanities Quarterly. We describe our process briefly below.
Cropping the faces
Our image cropping interface allows users to select a face with a cropping tool, if there is a face on the page.
If there is more than one face on the page, the previously cropped face(s) will be blocked out to help the user keep track of the faces cropped.
After collecting almost 4000 cropped faces from a selection of 50 issues, we verified our data for acuracy and used it to train a facial recognition algorithm, which was then used to extract all faces from the entire archive
Details about the automated face extraction can be found here, and a demo of the interface itself can be found here.
Tagging the faces
Once a collection of faces have been cropped and checked for errors, we deploy the tagging interface to Mechanical Turk.
The tagging interface shows users a page from the archive containing a face that is highlighted with a green rectangle. The user is then required to answer a series of questions about the face. A demo of the interface can be found here.
We used this interface to collect data on faces from a selection of 100 issues, and we are using this data to train classification algorithms. So far, we have classified all the faces in the archive by gender. More details about that can be found here.
Both the cropping and tagging interface are adaptable and can be used for multiple purposes. The software is available here for other researchers to use. Please cite us if you use it and feel free to contact us with any questions about how to use it.
Our research explores the context in which these images appeared, including both American history at large as well as the history of the publication itself. Our research seeks connections between the image database and the contextual framework that produced the images. We hope to underline the implicit and explicit transmission of cultural norms through the use of images.
Our first investigation looked at the relationship between the representation of women in the magazine, and cultural attitudes towards women. Focusing on the period between the 1940s and the 1990s, we found that the percentage of women's faces found in Time between the 1940s and 1990s correlates with attitudes towards women in both the larger historical context as well as within the textual content of the magazine.
The graph shown above shows the percentage of faces that present as female for each issue in our archive. The solid line is a LOWESS curve of the data.
An interesting pattern emerges in the smoothed data, notably an increase in the proportion of female faces from the 1920s to 1945, a post-Second World War dip, a rebound beginning in the mid-1960s, followed by a decrease in the 1980’s, and a final rebound beginning in the early-1990s.
We found that this trend was very consistent with the attitudes towards women expressed in the text of the magazine, and also with larger historical trends. Women increased their participation in public life as they entered the workforce during the Second World War, were ushered out of public life during the post-war period until the women's liberation movement of the 1970s. In the 1980s, we saw a backlash against feminism and a return to feminism starting in the early 1990s. These trends in cultural attitudes toward women tend to track well with the extent of their representation in the news magazine.
For more detail, please read our full study, "What's in a face? Gender representation of faces in Time 1940s-1990s", published in the Journal of Cultural Analytics.
In an upcoming publication, we conducted a detailed analysis of our subset of 8,789 faces sampled from 100 issues, which are categorized by detailed characteristics including facial expression, age, race, and context. The data revealed several large-scale trends that are consistent with the historical context of the magazine.
Currently, we are using our data collected through human labor to train algorithms to classify the images by the other categories listed above. We are particularly interested in facial expressions.
We are also currently collecting more data on advertisements, with a goal to extract all advertisements from the archive.
An important aspect of this project is to develop interactive visualizations to allow the public to explore our data and obtain their own insights from it. The ultimate goal of this project is to create a website with contextualized visualizations based on our findings from the archive.
In one of our first visualizations, we sorted the face images chronologically. The images are presented as thumbnails on a grid, where each column represents one year. In addition to sorting the images by year along the horizontal, we sort the images by their R G B value along the vertical direction so that the darkest image is on top. The background color over which the images are overlaid is the color representation of the average RGB pixel value for each year.
In the arrangement below, each column represents a year, and the images are arranged by average RGB value from top to bottom.
Further development of this work will include ways for the user to access details on demand about a particular photo, such as the context (whether the image was part of an article or an advertisement), the issue, the date, and other metadata.
While sorting all the images by year helps elucidate large scale temporal patterns and trends, a viewer may be overwhelmed by the multitudes, and some aggregate measure, such as a mean, may provide clarity or further insight.
Forming a composite image of all images within a given year is a rudimentary, but informative, method for exploring these other features. The average pixel value over all images was then calculated to yield the composite.
PixPlot is a tool developed by the Yale’s DH lab that uses unsupervised clustering to sort a large collection of images. We applied this software to a subset of our collection of faces, and we are developing more visualizations using this tool. Click on the image below to explore our face collection with pix-plot
We have developed a tool to enable our viewers to explore the metadata on their own.
Ana Jofre, Principal Investigator
SUNY Polytechnic Institute
Department of Communications and Humanities
Crean College of Health and Behavioral Sciences;
Fowler School of Engineering; Electrical Engineering and Computer Science
SUNY Polytechnic Institute
Department of Computer Science
Department of History
Kathleen Brennan (Post-doc at SUNY Polytechnic)
Aisha Cornejo (Chapman University)
Carl Bennett (SUNY Polytechnic)
John Harlan (SUNY Polytechnic)
Morgan Wewer (SUNY Polytechnic)
Ethan Schneider (SUNY Polytechnic)
Matthew Donoghue (SUNY Polytechnic)
Robert Zuch (SUNY Polytechnic)