Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
A collective of Northland iwi has pooled decades of irreplaceable te reo Māori recordings to create a unique resource for future generations.
But as the lucrative transcription market grows, experts are warning of the potential exploitation of Māori data by international companies.
This warning comes as Dr Karaitiana Taiuru, a legal expert with a speciality in data sovereignty, names artificial intelligence as “one of the next catastrophes” facing Māori, if the Government doesn’t step in to protect the taonga.
Calls for regulation are echoed by those who’ve developed Te Hiku Media’s world-leading transcription software, powered by a 30-year archive of the collective’s Māori language radio broadcasts, including the voices of native speakers – a population whose deep understanding of the reo is rare and dwindling.
Te Hiku has done all it can to protect its irreplicable dataset, including hosting its data storage onshore and creating data licensing agreements. But the firm knows some factors are out of its control.
At the same time, the Government is moving to accelerate growth in the AI sector. This week, in launching a new programme to drive Artificial Intelligence (AI) uptake among Kiwi businesses, Science, Innovation and Technology Minister Judith Collins said AI was predicted to contribute $76 billion to New Zealand’s annual GDP by 2038.
“It is crucial we support businesses to improve their awareness and uptake of AI, so they can capitalise on the benefits as the rest of the world rapidly adopts this technology,” Collins said.
Te Hiku Media chair Kylie Brown said truly intergenerational reo was exceptionally rare. The native speakers who carried those unbroken lines were only getting older, and Brown paid tribute to those who have passed.
The data Te Hiku wrangled spans 30 years, comprising decades of te reo Māori broadcast by speakers born as early as the 1890s. It represents an account of te reo Māori as it was before colonial powers severed its intergenerational transmission between primary speakers.
Brown said if someone wasn’t actively working on ways to preserve and transmit their knowledge, “then there’s something that’s going to be lost, and we probably don’t even know what it is”.
But there were further risks facing the language, with the potential exploitation of te reo Māori by international companies looking to leverage such a valuable dataset.
As international companies sought to profit off the lucrative transcription market, indigenous language datasets had become highly valued.
Taiuru – the Māori data sovereignty expert – warned of the possible impacts of the fast-paced growth of this lucrative translation market.
“I can tell you we have international companies already approaching major iwi and big Māori corporates in New Zealand and offering translation or transcription in the Māori language.”
Taiuru said these services were currently far from perfect, and were essentially built from “whatever they can find on the internet”.
To bolster their datasets, these companies had reached out to experts like Taiuru to solicit their services as AI trainers. He had received such emails “on a regular basis”.
The translations some of these groups provided were “really shocking”, said Taiuru. They “don’t make sense, or don’t even have Māori words in there”. But AI learns from its mistakes; a process accelerated by infusions of good data. “The more they can get, the more powerful their system becomes.”
Data was everything for these companies, he said. Taiuru couldn’t put a dollar figure on it, but said the value was significant. “It’s worthwhile for international companies to try to get it.”
Te Hiku chief executive Peter-Lucas Jones said he thought of data like land: a story, for Māori, of loss. He did not want to see history repeat itself in the digital age. Much data was “vacuumed up and used without our permission”, but in the context of indigenous data, the prospect was especially flagrant.
“When we think about what that could mean for indigenous languages like te reo Māori, which was literally beaten out of the mouths of those generations above us, it would enable the development of a service that would be sold back to the very people that had it forcibly removed from them,” Jones said.
Northland’s relative lack of fertile land meant the impacts of colonisation were felt less than other areas, he said. “Our isolation provided us an opportunity to preserve our language and culture in ways that other tribal groups perhaps didn’t have.”
Some of those speakers, including Jones’ grandparents, used radio to broadcast their reo. And, critically, they archived all of it – some of it piling up in storerooms on cassette tapes.
“Who would have thought that our 30-year sound archive would have been perfect for the development of these tools to address problems associated with language revitalisation in the digital world today?” asked Jones. “Who would have thought that our grandparents, who were born in communities where you could count the white people on one hand, would have been so in tune with what the future would hold for their mokopuna?”
What Te Hiku had developed was precisely the gold mine overseas companies were looking for, and while Te Hiku had done what they could to secure it, laws had lagged.
Taiuru said urgent revisions to intellectual property laws, specifically the Copyright Act, were necessary to protect valuable New Zealand data – indigenous or not.
Archives New Zealand – the government agency that looks after the country’s official record – does not currently consider AI as part of its archives strategy. “If they were collecting artworks or taonga or other things of national interest, there’d be government assistance, right?” asked Taiuru. “There’d be some sort of intervention. But because it’s data and it’s AI, we’re not seeing that yet.”
He saw this lack of provision as a major problem, and agreed addressing it was either something the Government could choose to do now, or would be forced to do in the near-future.
Waitangi Tribunal claim Wai262 offered another option. The claim involves the legal standing of taonga in New Zealand law. Taiuru said iwi needed to be “proactive in ensuring that data is treated the same way as we treat our land and natural resources: as a taonga that requires Māori governance, Māori co-design and investments”.
In the meantime, he said New Zealand companies could work to repatriate their data moving it from overseas servers to onshore facilities. Te Hiku has done this, and now has a GPU cluster (group of physical computer servers) in Kaitaia. Jones said the in-house option secured control, but was also a cheaper way to produce the AI models that underpin their mission.
Add the in-house systems to a veritable gold mine of data, and put the processing power in the hands of invested community members, and “that’s how we’re creating the best bilingual tools for te reo Māori,” he said.
When shared between iwi radio stations, this data is protected by Te Hiku’s Kaitiakitanga data licence. But that licence is an opt-in arrangement, and the sanctity of the data has yet to be tested by an outside purveyor.
“I don’t even know how huge that risk is,” said Brown, “but Kaumātua and elders want their knowledge maintained and restored and gifted and transmitted to their children and grandchildren in perpetuity, and protection of that resource is paramount.”