# Cultural and heritage data resources

## Data sources

Language data sources on [Hugging Face](https://huggingface.co/datasets?sort=trending) (e.g. [isiZulu](https://huggingface.co/datasets?language=language:zu\&p=1\&sort=trending)) and [Kaggle](https://www.kaggle.com/datasets?tags=13204-NLP)

[Broadcast Research Council of South Africa](https://brcsa.org.za/) data on audience trends for TV and radio are now available online.

National Archives and Records Service of South Africa (NARSSA) search of more than 8.3 million items on the ['new' database](http://www.nationalarchives.gov.za/node/737?q=search-the-collections) (partial) and the ['old' database](http://www.national.archives.gov.za/) (complete).

South African Heritage Resources Agency (SAHRA) manages the [South African Heritage Resources Information System (SAHRIS)](https://sahris.sahra.org.za/search/site)

South African History Archives (SAHA) [collections](http://www.saha.org.za/collections.htm)

HSRC Press [open access book collection](https://www.hsrcpress.ac.za/books) (400+ titles)

Wits University [Historical Papers Research Archive](http://www.historicalpapers.wits.ac.za/)

Various [digital collections](https://www.digitalcollections.lib.uct.ac.za/top-level-collection) at the University of Cape Town.

Zamani Project is posting [GIS spatial data packages and 3D models](https://zivahub.uct.ac.za/search?groups=20762\&itemTypes=3) of heritage sites across Africa

South African Centre for Digital Language Resources (SADiLaR) [collection of language resources](https://repo.sadilar.org/handle/20.500.12185/7)

[Repository of the 500 Year Archive](https://fhya.org/repository/browse), an experimental digital research tool. It is designed to support historical enquiry into the five hundred years before colonialism in what is today KwaZulu-Natal and neighbouring regions. It convenes online diverse materials including, amongst other things, texts, images, recordings, excavated items and botanical material, as well as early vernacular publications.

According to the National Library of South Africa (NLSA), "all newspapers published in South Africa are collected by the NLSA" and a [digital archive is being implemented](https://cdm21048.contentdm.oclc.org/digital/).

## Example cultural data applications

See [this exhibition of projects](https://fakugesi.co.za/hack-ur-culture-virtual-exhibition/) developed for #HackUrCulture, hosted by the Goethe Institut Johannesburg and Credipple in October 2020.

There are some examples of what can be done with open cultural data, mostly from the US (but please [Tweet us](https://twitter.com/OpenDataZA) about others if you have seen them).

![Artefact project for HackUrCulture](https://713906002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LD_JXmK-On6DnxCOASD%2F-MMU9YDY9_NIhq1KpfQQ%2F-MMUBJjBLruYBbxFQnxA%2FScreenshot%202020-11-19%20at%2008.28.20.png?alt=media\&token=bcf7da2f-9d56-4cb1-905b-e039c479d826)

[Map a trip](http://publicdomain.nypl.org/greenbook-map/trip.html) using the New York Public Library (NYPL) Green Book items

![Navigating the Green Book at NYPL](https://713906002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LD_JXmK-On6DnxCOASD%2F-MHoTHTKw08XIe8rIkW2%2F-MHoWXMg0JbfQ_tA8tgY%2FScreenshot%202020-09-21%20at%2017.39.14.png?alt=media\&token=0e730c35-dba4-49bd-8a3c-30a0703e85f1)

[Southern Mosaic](https://labs.loc.gov/work/experiments/southern-mosaic/) is a visual story using data from the US Library of Congress

![Southern Mosaic visualisation of artist locations and titles](https://713906002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LD_JXmK-On6DnxCOASD%2F-MHoTHTKw08XIe8rIkW2%2F-MHoX4KclQkkKAfE04JG%2FScreenshot%202020-09-21%20at%2017.47.09.png?alt=media\&token=df8eb3c4-598d-4c7c-8562-755c292139f3)

Also by the New York Public Library, a [visual grouping](http://publicdomain.nypl.org/pd-visualization/) of 180,000+ public domain items

[The Met](https://artsandculture.google.com/partner/the-metropolitan-museum-of-art?col=RGB_518077) has collaborated with Google to enable searching of archives using colour

A [visual timeline](https://harvardart.askewbrook.com/) of the Harvard Art Museum collection

## Additional reading on open cultural data

[Exploring Arts Engagement with (Open) Data](https://www.timdavies.org.uk/2019/02/18/exploring-arts-engagement-with-open-data/) by Tim Davies

[Open cultural data: Curating GLAM in the digital age](https://www.thejakartapost.com/life/2020/08/02/open-cultural-data-curating-glam-in-the-digital-age.html) in the Jakarta Post

[Data as Culture](http://culture.theodi.org/) with ODI

[A Nerd’s Guide To The 2,229 Paintings At MoMA](https://fivethirtyeight.com/features/a-nerds-guide-to-the-2229-paintings-at-moma/) and the [data on Github](https://github.com/MuseumofModernArt)

[How We Learned to Stop Worrying and Love Open Data: A Case Study in the Harvard Art Museums’ API](https://medium.com/@andrea_ledesma/how-we-learned-to-stop-worrying-and-love-open-data-a-case-study-in-the-harvard-art-museums-api-893c3f40ecb7) by Harvard Art Museum

A list of '[Cool stuff made with cultural heritage APIs](http://museum-api.pbworks.com/w/page/21933412/Cool%20stuff%20made%20with%20cultural%20heritage%20APIs)'&#x20;

120kMoMA - [A data visualization study of The Museum of Modern Art collection dataset of 123,919 records](https://medium.com/@WallHelen/120kmoma-ae298a2a57b7)

[Using Public Domain Materials in the Classroom](https://www.nypl.org/blog/2016/01/20/public-domain-in-the-classroom) by New York Public Library

Blog on [how people have used MoMA’s data so far](https://medium.com/@foe/here-s-a-roundup-of-how-people-have-used-our-data-so-far-80862e4ce220)

## Tools to try

For visualisation, there are many to try out like [Flourish](https://flourish.studio/) and [Datawrapper](https://www.datawrapper.de/). If you're more technical and using Python or R, have a look at this [summary of libraries](https://towardsdatascience.com/top-9-libraries-for-data-visualization-in-python-and-r-51bdf08e5d54).

Have a look at these storytelling [tools from Knightlab](https://knightlab.northwestern.edu/projects/) including Timeline, StoryMap, Soundcite and Juxtapose.

For mapping relationships or networks as a story try [GraphCommons](https://graphcommons.com/), see [this example](https://graphcommons.com/stories/8de49ba6-68b4-4d8b-af6b-6336e7520742/slides/2) of three musicians in a recording ecosystem. [Kumu](https://kumu.io/) is also popular for network visualisation.

For mapping, something like [Kepler](https://kepler.gl/) is easier to use. For more detail on working with spatial data see [this page](https://opendataza.gitbook.io/toolkit/open-data-resources/working-with-spatial-data).

If you want to get data tables out of PDFs you can try [Tabula](https://tabula.technology/). [OpenRefine](https://openrefine.org/) is good for cleaning data.

If you want to analyse text in books or articles (e.g. to identify people and places) there are lots of tools to try like [TextRazor](https://www.textrazor.com/demo), [Intellexer](http://demo.intellexer.com/) and [Google's Natural Language](https://cloud.google.com/natural-language).

##
