Desikan, Bhargav Srinivasa; Hajime Shimao and Helena Miton

With the increase in massive digitized datasets of cultural artefacts, social and cultural scientists have an unprecedented opportunity for the discovery and expansion of cultural theory. The WikiArt dataset is one such example, with over 250,000 high quality images of historically significant artworks by over 3000 artists, ranging from the 15th century to the present day; it is a rich source for the potential mining of patterns and differences among artists, genres, and styles. However, such datasets are often difficult to analyse and use for answering complex questions of cultural evolution and divergence because of their raw formats as image files, which are represented as multi-dimensional tensors/matrices. Recent developments in machine learning, multi-modal data analysis and image processing, however, open the door for us to create representations of images that extract important, domain-specific features from images. Art historians have long emphasised the importance of art style, and the colors used in art, as ways to characterise and retrieve art across genre, style, and artist. In this paper, we release a massive vector-based dataset of paintings (WikiArtVectors), with style representations and color distributions, which provides cultural and social scientists with a framework and database to explore relationships across these two vital dimensions. We use state-of-the-art deep learning and human perceptual color distributions to extract the representations for each painting, and aggregate them across artist, style, and genre. These vector representations and distributions can then be used in tandem with information-theoretic and distance metrics to identify large-scale patterns across art style, genre, and artist. We demonstrate the consistency of these vectors, and provide early explorations, while detailing future work and directions. All of our data and code is publicly available on GitHub.