Text and Data Mining is a research method that involves extracting information from content using software and technological methods. TDM can include the analysis files, text, webpages, images, social media posts etc, but may also be applied to non text based works such as images. The act of TDM allows data to be searched, analysed and extracted far quicker than manual searching as thousands of data sources can be searched in a matter of seconds, with results surfaced and structured ready for exploration.
TDM is often an essential part of AI, machine learning and big data activities, functioning as a mechanism for training programmes and analysing/ interpreting data.
Under the law, the exception allowing for TDM is headed ‘Copies for text and data analysis for non-commercial research’ but the wording within the legislation simply refers to the notion of ‘computational analysis’.
The law allows for this ‘computational analysis’ to be carried out on any content a person engaged in non-commercial research has legal access to. This could be content available online and to content that is procured and subscribed to by institutions.
There are four stages to the TDM process. First, potentially relevant documents are identified (Stage 1). These documents are then turned into a machine-readable format so that structured data can be extracted (Stage 2). The useful information is extracted (Stage 3) and then mined (Stage 4) to discover new knowledge, test hypotheses, and identify new relationships.
Image credit: JISC / Value and Benefits of Text Mining (2012)
TDM activity involves harvesting data for analysis, cleaning the data, ordering and indexing it. In order to do that a copy of the data to be analysed must be obtained or extracted and transferred to the appropriate/ desired tool for analysis. This is where copyright comes in- making a copy is an act of copyright infringement which is permitted under the exception without the permission of the owner of the data.
 Copyright Designs and Patents Act 1988, as amended, s.29A https://www.legislation.gov.uk/ukpga/1988/48/section/29A
 Ibid, s29A (1)(a)
 Liber, Text & Data Mining https://libereurope.eu/topic/text-data-mining/ Accessed 05/03/2021