小程序
传感搜
传感圈

The majority of AI training data will be synthetic by next year, says Gartner

2023-08-03 21:37:05
关注

  •  

Most data used to train machine learning models will be synthetic and automatically generated, a new report from Gartner predicts. Only 1% of all AI training data was synthetic in 2021 but analysts suggest it could hit 60% by the end of 2024. Governance and vigilance about biases are essential to prevent this data from suffering the same challenges as organic data, one expert told Tech Monitor.

Analysts predict more than 60% of data used to train AI models will be synthetic by the end of 2024. Photo: Yurchanka Siarhei/Shutterstock
Analysts predict more than 60% of data used to train AI models will be synthetic by the end of 2024. (Photo by Yurchanka Siarhei/Shutterstock)

Synthetic data is generated by AI to fill in missing gaps in real-world information such as medical imaging or information on specific disease patterns. In new research on trends in data science, published this week, Gartner predicts that by 2024 more than 60% of all AI model training data will be synthetic, something it says will lead to better AI systems.

This move from organic to synthetic training data is part of a wider shift towards data-centric AI, such as those used to produce large language and foundation models. “Solutions such as AI-specific data management, synthetic data and data labelling technologies, aim to solve many data challenges, including accessibility, volume, privacy, security, complexity and scope,” Gartner’s report says.

A recent report by GlobalData found that synthetic data start-ups were “redefining the landscape of data generation”. Describing it as the “master key to AI’s future”, Kiran Raj, practice head of disruptive tech at GlobalData, said the start-ups were breaking through the shackles of data quality and regulation. “As the demand for reliable, cost-effective, time-efficient, and privacy-preserving data continues to accelerate, start-ups envision a future powered by synthetic data, ushering a new era of machine learning progress,” Raj said.

It has the potential to have positive impacts across a range of sectors. In healthcare, it is already being used to augment real patient data for training doctors, improving drug discovery and optimising systems. In the financial services sector, it is helping to mitigate risk and detect fraud. And in retail, it is improving demand forecasting, personalised marketing and fraud detection.

AI moving to the edge

The other key trends noted by Gartner include a shift towards edge processing for AI. Processing data at the point of creation will help organisations gain real-time insights and detect new patterns, according to the report. It will also make it easier to meet ever more stringent data privacy requirements. The organisation predicts more than 55% of data analysis by neural networks will occur in an edge system by 2025. 

Gartner analysts predict there will be a greater emphasis on responsible AI. This includes ensuring that technology is used as a positive force rather than a threat to society. It includes ensuring businesses make ethical choices when adopting AI that addresses societal value, risk, trust, accountability and transparency. These are the core requirements making up many of the AI regulations being developed around the world, including in the UK.

Organisations should adopt a “risk-proportional approach” to AI investment and deployment, the analysts warned. This includes taking caution when applying solutions and models, and seeking assurances from vendors to ensure they are managing their own risk and compliance obligations. This will help protect them from financial loss and legal action. 

Content from our partners

AI will equip the F&B industry for a resilient future

AI will equip the F&B industry for a resilient future

Insurance enterprises must harness the powers of data collaboration to achieve their commercial potential

Insurance enterprises must harness the powers of data collaboration to achieve their commercial potential

How tech teams are driving the sustainability agenda across the public sector

How tech teams are driving the sustainability agenda across the public sector

Some foundation model and generative AI organisations are offering degrees of indemnity from these risks. Adobe says it will cover costs associated with copyright claims from the use of its Firefly generative AI image model. This is because the company is confident the model is trained solely on licenced and authorised data that won’t produce copyright-suspect output.

View all newsletters Sign up to our newsletters Data, insights and analysis delivered to you By The Tech Monitor team

Healthcare and disease detection

Peter Krensky, director analyst at Gartner, said: “As machine learning adoption continues to grow rapidly across industries, data is evolving from just focusing on predictive models, towards a more democratised, dynamic, and data-centric discipline. This is now also fuelled by the fervour around generative AI. While potential risks are emerging, so too are the many new capabilities and use cases for data scientists and their organisations.”

Caroline Carruthers, data expert and co-founder of global data consultancy Carruthers and Jackson, told Tech Monitor that synthetic data was an invaluable tool for training AI models, particularly where large datasets weren’t available. “It’s been used most effectively in the healthcare sector, where data on rare diseases has been supplemented by synthetic data to improve modelling of treatment options,” she says. 

Carruthers said that while there is “clear value to expanding limited datasets with synthetic data, there are a number of risks”, including the possibility that biases that are prominent in smaller datasets might be amplified by synthetic data using it as a foundation. She adds: “The bottom line is that synthetic data faces the same challenges as organic data when it comes to the need for governance and being vigilant about potential biases.”

Read more: Adobe Firefly offers indemnity from generative AI copyright claims

Topics in this article : AI

  •  

参考译文
据Gartner称,到明年,大多数人工智能训练数据将是合成数据。
根据Gartner的一份新报告预测,用于训练机器学习模型的大部分数据将成为合成的并自动产生。2021年,所有人工智能训练数据中只有1%是合成数据,但分析师认为,这一比例可能在2024年底达到60%。Tech Monitor的一位专家表示,为了防止这些数据遭遇与原始数据相同的问题,治理和警惕偏见至关重要。分析师预测,到2024年底,用于训练AI模型的60%以上数据将是合成的。(图片由Yurchanka Siarhei / Shutterstock提供)合成数据由人工智能生成,用于填补现实世界数据的缺失,例如医学影像或特定疾病模式的信息。Gartner本周发布的新研究报告《数据科学趋势》预测,到2024年,所有AI模型训练数据中将有超过60%是合成的,这将带来更好的AI系统。从原始数据向合成数据的转变是转向以数据为中心的人工智能更广泛转变的一部分,例如用于生成大型语言模型和基础模型。Gartner的报告称:“诸如AI特定数据管理、合成数据和数据标注等解决方案旨在解决许多数据挑战,包括可访问性、数量、隐私、安全、复杂性和范围。” GlobalData最近的一份报告发现,合成数据初创公司正在“重新定义数据生成领域”。GlobalData颠覆性技术业务主管Kiran Raj将合成数据描述为“通往人工智能未来的大门钥匙”,称这些初创公司正在打破数据质量和监管的束缚。Raj表示:“随着对可靠、成本效益高、时间高效且保护隐私的数据需求持续增长,初创公司构想了由合成数据驱动的未来,开启了机器学习发展的新时代。” 合成数据有潜力在多个领域带来积极影响。在医疗领域,它已被用于补充真实患者数据以培训医生,提高药物研发效率和优化系统。在金融服务领域,它有助于降低风险和检测欺诈。而在零售领域,它正在改善需求预测、个性化营销和欺诈检测。人工智能走向边缘计算 Gartner指出的另一个关键趋势是AI向边缘计算的转变。该报告指出,在数据的创建点进行处理将帮助组织获得实时洞察并发现新模式。同时,这也更容易满足日益严格的数据隐私要求。该组织预测,到2025年,超过55%的神经网络数据处理将在边缘系统中完成。Gartner分析师预测,对负责任AI的重视将增加。这包括确保技术作为积极力量被使用,而不是对社会构成威胁。这包括确保企业在采用AI时做出符合伦理的选择,解决社会价值、风险、信任、责任和透明度等问题。这些是构成全球许多AI法规核心要求的一部分,包括英国。分析师警告称,组织应在AI投资和部署中采取“按风险比例调整”的方法。这包括在应用解决方案和模型时谨慎行事,并从供应商那里寻求保证,以确保他们管理自己的风险和合规义务。这将帮助组织避免财务损失和法律行动。来自我们的合作伙伴的内容:人工智能将为食品和饮料行业打造一个有弹性的未来保险公司必须利用数据协作的力量以实现其商业潜力科技团队如何推动公共部门的可持续发展议程一些基础模型和生成式人工智能组织正在提供不同程度的对这些风险的赔偿保障。Adobe表示,它将承担因使用其Firefly生成式人工智能图像模型而引发的版权索赔相关费用。这是因为该公司相信该模型仅使用授权的数据进行训练,不会生成存在版权争议的输出结果。查看所有通讯邮件注册我们的通讯邮件由《Tech Monitor》团队为您送达数据、见解和分析在此注册人工智能、健康与疾病检测Gartner分析师Peter Krensky表示:“随着机器学习在各行业的快速应用,数据正在从仅仅关注预测模型,向更加民主化、动态化和以数据为中心的学科演变。这一趋势也由生成式人工智能的热烈发展所推动。虽然潜在风险正在浮现,但数据科学家及其组织的许多新能力和用例也正在出现。” 全球数据咨询公司Carruthers and Jackson的联合创始人兼数据专家Caroline Carruthers告诉Tech Monitor,合成数据是训练AI模型的宝贵工具,特别是在大型数据集不可用时。“它在医疗领域最有效,用于通过合成数据补充罕见病的数据,从而改进治疗方案的建模。” 她说。Carruthers表示,虽然通过合成数据扩展有限数据集具有明显的价值,但也存在一些风险,例如,原始数据集中突出的偏见可能会被使用这些数据作为基础的合成数据放大。她补充道:“关键在于,合成数据与原始数据一样,在治理和警惕潜在偏见方面面临同样的挑战。” 阅读更多:Adobe Firefly为生成式人工智能版权索赔提供赔偿保障本文主题:人工智能
您觉得本篇内容如何
评分

评论

您需要登录才可以回复|注册

提交评论

广告
提取码
复制提取码
点击跳转至百度网盘