Bias in AI Models: Could Synthetic Data Be the Solution for Ethical AI?

Once considered the preserve of data scientists, artificial intelligence is now ubiquitous in our daily lives.

Today, it allows everyone from business analysts to industry experts to data scientists to collaborate and quickly get automated insights from data.

Human Resources, for example, receives a huge number of applications for every new position – especially in the technology sector. In order to identify the best candidate and fill the vacancy as quickly as possible, artificial intelligence can be used to automate the process and automatically generate relevant information. However, like any other technology, AI is almost entirely dependent on people, who are a key element in creating effective models. To implement machine learning and deep learning algorithms, humans must be able to identify hidden distortions in data. This helps prevent the creation of patterns that could lead to discriminatory results.

It is the human who provides these models with the data they need to extrapolate ideas and trends. However, biases can interfere with this process, introducing poor quality data into AI models and leading to erroneous and biased results.

How can companies ensure that their AI models not only provide fast and accurate insights, but are also ethical?

Sharing Ethical Information Depends on Data

To sum it up: AI is a pattern recognition tool. It reacts to incoming data the way it was programmed. How it is built is the responsibility of data scientists and developers, but the data used to make it work is often collected, delivered, and contextualized by separate departments.

However, these activities must be regulated for the ethical use of data across the company. The legal framework defines how, where and when data can (or should) be used, or simply not be used. In the EU, the GDPR is a general framework, but this demand for fairness, ownership and data transparency requires the adoption of specific rules and internal processes in each company.

Without cross-agency skills, data knowledge, and a regulatory framework, the data selected to submit AI models may be erroneous, incomplete, or inappropriate. In particular, they may contain discriminatory elements. Such was the case with retail giant Amazon, which developed a prototype recruiting algorithm. The data used covers a period of ten years and aims to identify the best candidates based on the performance of employees in similar positions in the past. However, this algorithm proved to be discriminatory, especially against people who self-identify as women. Why ?

Consequences of feeding AI low-quality data

One of the main misconceptions about AI is that it is a magic box that can predict the future. In fact, AI is a pattern recognition tool that works precisely with data. When you feed the AI ​​model 1000 data points from high performers and 84% of those employees have a similar point, the AI ​​will focus on that point. In the Amazon example above, male candidates are statistically better in this role because historically more males have done the job.

The issue of bias is related to historical and systemic issues in the technology industry. Studies show that women make up only 16% of the world’s senior personnel in this sector. Although this is an obvious and human-damaging discrimination, the AI ​​highlights these results because they are based on the data provided to it. In this case, they discount applications containing the word “women” or other synonyms, since women are statistically less likely to be recruited than men. The Amazon development team blocked these endpoints, but this could not guarantee that the algorithm would not again find others that favor men, as the model was filled with skewed data. Therefore, Amazon closed the project.

Synthetic data: ethical support?

AI-based ethical communication requires three main elements. First, the data. Then the quality of this data to ensure reliable information once embedded in the AI ​​model. But what if there isn’t enough quality data?

Then synthetic data comes into play – information generated, automatically annotated and extrapolated from fully representative datasets. This artificially generated data mimics the statistical properties of the original data set, but hides all of the real data from which it was generated. Synthetic data can be extrapolated from the real data set to adjust its size without changing its statistical or representative relevance. The third important element is having a qualified and trained team.

Loading data into the AI ​​model happens relatively late. The development team usually receives a request for a specific model for a specific task. To build this AI model, the development team will request data from relevant teams, including HR. If this data is provided without prior verification and cleaning – for example, a list of resumes from the last ten years – it is likely to contain a large amount of biased data.

The ability to create a representative sample from the collected data is relatively easy with data sampling tools. By cleaning the database, you can get a random sample without biased results. You can also use privacy protection techniques to ensure that data cannot be traced back to individuals. Without a combination of data and industry knowledge, developers do not have sufficient experience to create representative samples on their own. There is also a need to strengthen skills within the department to facilitate data handling and support company goals.

To stay innovative and realize the added value of data, the cross-departmental use of AI models and their results is essential. As more and more employees access data and use analytics, breaking down siled data warehouses and turning analytics into a more collaborative process, recruiting and developing business experts is an important step towards eradicating biased data.

Diverse teams in the field are much better able to detect these biases through their own experience. That’s why, by working within a well-defined and contextualized regulatory framework, developers and data scientists will be able to collaborate more effectively with different teams to feed better data into AI models, improve their accuracy, and thus provide more ethical data. results.

Back to top button

Adblock Detected

Please consider supporting us by disabling your ad blocker.