White Hat Hackers Discover Microsoft Leak of 38TB of Internal Data Via Azure Storage
White Hat Hackers Discover Microsoft Leak of 38TB of Internal Data Via Azure Storage
Your email has been sent
The Microsoft leak, which stemmed from AI researchers sharing open-source training data on GitHub, has been mitigated.
Microsoft has patched a vulnerability that exposed 38TB of private data from its AI research division. White hat hackers from cloud security company Wiz discovered a shareable link based on Azure Statistical Analysis System tokens on June 22, 2023. The hackers reported it to the Microsoft Security Response Center, which invalidated the SAS token by June 24 and replaced the token on the GitHub page, where it was originally located, on July 7.
Jump to:
The hackers first discovered the vulnerability as they searched for misconfigured storage containers across the internet. Misconfigured storage containers are a known backdoor into cloud-hosted data. The hackers found robust-models-transfer, a repository of open-source code and AI models for image recognition used by Microsoft’s AI research division.
The vulnerability originated from a Shared Access Signature token for an internal storage account. A Microsoft employee shared a URL for a Blob store (a type of object storage in Azure) containing an AI dataset in a public GitHub repository while working on open-source AI learning models. From there, the Wiz team used the misconfigured URL to acquire permissions to access the entire storage account.
When the Wiz hackers followed the link, they were able to access a repository that contained disk backups of two former employees’ workstation profiles and internal Microsoft Teams messages. The repository held 38TB of private data, secrets, private keys, passwords and the open-source AI training data.
SAS tokens don’t expire, so they aren’t typically recommended for sharing important data externally. A September 7 Microsoft security blog pointed out that “Attackers may create a high-privileged SAS token with long expiry to preserve valid credentials for a long period.”
Microsoft noted that no customer data was ever included in the information that was exposed, and that there was no risk of other Microsoft services being breached because of the AI data set.
This case isn’t specific to the fact that Microsoft was working on AI training — any very large open-source data set might conceivably be shared in this way. However, Wiz pointed out in its blog post, “Researchers collect and share massive amounts of external and internal data to construct the required training information for their AI models. This poses inherent security risks tied to high-scale data sharing.”
Wiz suggested organizations looking to avoid similar incidents should caution employees against oversharing data. In this case, the Microsoft researchers could have moved the public AI data set to a dedicated storage account.
Organizations should be alert for supply chain attacks, which can occur if attackers inject malicious code into files that are open to public access through improper permissions.
SEE: Use this checklist to make sure you’re on top of network and systems security (TechRepublic Premium)
“As we see wider adoption of AI models within companies, it’s important to raise awareness of relevant security risks at every step of the AI development process, and make sure the security team works closely with the data science and research teams to ensure proper guardrails are defined,” the Wiz team wrote in their blog post.
TechRepublic has reached out to Microsoft and Wiz for comments.
Strengthen your organization’s IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices.
White Hat Hackers Discover Microsoft Leak of 38TB of Internal Data Via Azure Storage
Your email has been sent
TechRepublic Premium content helps you solve your toughest IT issues and jump-start your career or next project.
Microsoft is also running a grant competition for ideas on using AI training in community building.
Generative AI will be a game changer in cloud security, especially in common pain points like preventing threats, reducing toil from repetitive tasks, and bridging the cybersecurity talent gap.
Does your business need a payroll provider that offers international payroll services? Use our buyer’s guide to review the best solutions, from ADP to Oyster.
Get up and running with ChatGPT with this comprehensive cheat sheet. Learn everything from how to sign up for free to enterprise use cases, and start using ChatGPT quickly and effectively.
Looking for an alternative to monday.com? Our comprehensive list covers the best monday alternatives, their key features, pricing, pros, cons and more.
Strengthen your organization’s IT security defenses by keeping abreast of the latest cybersecurity news, solutions, and best practices.
Backup routines are the lifeblood of information technology. Protecting an organization’s data is a computer professional’s highest priority. Whether it’s hackers, disgruntled or confused users, a fire or flood, hardware failure or just thunderstorms threatening the information you’re charged with protecting, a solid backup routine is at least a best practice if not a best …
To comprehend the true power of the Internet of Things, it’s helpful to become familiar with at least some of the terms involved in it. This list of 31 concepts and technologies, from TechRepublic Premium, will help you grasp the vocabulary behind IoT and the ideas supporting an interconnected, all-things-networked world. From the glossary: DIGITAL …
Windows is not without certain issues and flaws, of course. Like any operating system it has been exposed to numerous vulnerabilities, both deliberate (viruses) and unintentional (exploitable holes in programs or processes). This policy from TechRepublic Premium provides guidelines for securing Windows on company computers or computers used to conduct company business. From the policy: …
Macs, like Windows computers, benefit from routine maintenance. Regular tune-ups help protect Macs, both desktop and laptop models, from potential data loss, needlessly slow performance and other issues. Just what steps should a macOS tune-up checklist include? Check out the steps below, from TechRepublic Premium, to ensure you get proper, consistent results during each macOS …
source