For the past months, OFE has been involved in an intensive research process regarding the various arguments and approaches relating to text and data mining (TDM) in Europe, which culminated with the paper published today, titled “An analytical review of text and data mining practices and approaches in Europe”.
We have done an extensive desk research, including most of the benchmark reports, such as the European Commission funded Expert Group Report (2014), the study by De Wolf and Partners (2014), the UK IPO’s ‘Exceptions to Copyright’ brief (October 2014), as well as numerous other reports, position papers, articles and blog posts. The initial findings have been discussed at the Round Table that OFE organised in October 2015, the conclusions of which are available in the follow-up White Paper. The desk research and Round Table discussions have been complemented by a series of interviews with academics, researchers, start-ups, and more established companies (including publishers and infrastructure providers)
This paper does not limit itself at pointing to the various challenges faced by mostly all stakeholders, but it also provides policy recommendations based on what we concluded from the research process.
Some of the conclusions that we have drawn:
The Commission needs to provide coherence and harmonisation for TDM across Europe, through a regulatory intervention proportional to the benefits of TDM and the costs of non-intervention.
The Commission should aim to achieve coherence in the legal provisions which it seeks to apply to TDM, with no consideration of ‘commercial’ versus ‘non-commercial’ purposes. Europe needs a regime which enables any researcher, citizen, company or other entity to engage in TDM activities, using material to which they have lawful access. The exact commercial rewards can be managed at subsequent stages, depending on the implementation of the mining outcome. The protection could be considered at the point at which some clearly commercially beneficial project, product, service, business or company has emerged.
A generalised exception for TDM represents the needed liberalised approach, allowing everyone to decide what to do with their content. Many voices echo the fact that licences are not an alternative to a mandatory exception.
Being aware that publishers do not always push for a licensing approach purely out of a desire to maximise the royalties that they receive, but also fears of overloading (or even blocking) the publisher’s website or other servers as a consequence of having allowed too much traffic, a balance should be found between the measures imposed by publishers to avoid website/server overload or piracy and the actual needs of the miners.
Even if TDM is to be allowed through a generalised exception, APIs will still be needed to do the actual mining. Trusted third party platforms could provide a middle ground where publishers feel more confident that their content is not about to be misappropriated, and where miners feel they can engage in TDM without their project being put at risk of plagiarism or other sharp practice.
In order to be sustainable and to avoid the need for future legislative updates, the provision should be drafted in neutral terms, sufficient to withstand the passage of time and likely evolution of the associated technology.
Bringing all stakeholders around the table appeared to be something necessary in view of the legislative decision-making process, not least because there remains a degree of mistrust between some publishers and some researchers. Sometimes the presence of diverging interests can motivate such tension, but in other cases there can indeed be factors or aspects to which one category of stakeholder rightfully points, but which are not always foreseeable or even obvious for other categories of stakeholder.