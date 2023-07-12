Data science, a collection of statistical tools that allows us to mine for information in data we gathered, was met with great skepticism when it started in the 1960s. It seemed (too) easy to find patterns in almost any set of data: spurious ones.
But fast-forward to 2012, and we see the Harvard Business Review claim, after the advent of big data provoked a new gold rush, that data scientist is “the sexiest job of the 21st century.”
A similar story can be told for artificial intelligence. For decades it was unable to live up to expectations and started to consolidate only in the 1980s when proven probabilistic modeling techniques from outside AI were adopted and greatly advanced the field. So when big data started to become available in the 2000s, it fell on fertile soil.
When we talk about big data, we mean big: trillions of words of text, billions of images, billions of hours of speech or video, or the daily generation of billions of data points from swiping credit cards, clickstreams and social media. Big data has transformed both data science and AI.
Equipped with sophisticated analytical tools, data scientists can now identify patterns impossible to find otherwise, and algorithms, trained on big data combined with recent deep-learning techniques, produce staggering results.
For example, instead of analyzing speech based on language rules (as had been done for decades), we now make probabilistic predictions what the next word in a sentence should be. This approach produces results so good it has begun to outperform humans even in areas considered “safe from AI” for decades to come.
Many argue that data science applied to big data or in AI is more than a collection of mathematical tools and techniques; it is a force potent enough to shape and transform societies at the scale of what the taming of fire or electricity did. The upside is an endless string of new possibilities.
The AI sector is projected to grow from $30 billion to more than $150 billion within the next four years. Performance of the S&P 500 is in sync with the stocks of a handful of companies, among them Apple, Microsoft, Alphabet (Google), Amazon and Meta (Facebook).
The downside is familiar, too. Social media pose serious threats to our mental health and encourage the growth and solidification of echo chambers that threaten to tear our society apart. Online platforms infringe upon our autonomy with attempts to manipulate our decisions or nudge us toward outcomes we did not deliberately choose.
And when powerful agents (companies, government) leverage big data against individuals or citizens, consequences can be nefarious, especially when people are vulnerable already (say, as a result of race, gender or socioeconomic standing). It is not necessarily bad intent – although snake oil is being sold; poor design is enough to create what Cathy O’Neil called “weapons of math destruction”; she worked in the field for years before writing her book of the same title.
This is where data science ethics enters. All relevant professional organizations – the American Statistical Association, the Association for Computing Machinery, the Institute of Electrical and Electronics Engineers Computer Society, and the Data Science Association, to name just the usual suspects – have a code of ethics and/or professional conduct that speaks to risks and opportunities. The overarching theme they all share is beyond dispute: Science and engineering shall benefit humanity.
In addition, we have a host of nonprofit organizations and government white papers dedicated to the same goal: building an AI society that is good and just.
We also have concrete proposals, detailing specific steps for making ethical considerations an integral part of the design process before any data has been collected or a single line of code has been written. When followed with some diligence, all these frameworks stop weapons of math destruction from being developed or deployed.
(An aside: Data science ethics will soon become part of the education for students who major in data science and applied statistics at Purdue Fort Wayne.)
Ethical or professional codes of conduct are pipe dreams, however, when the environment is unsupportive or cynical about moral values and puts profits before people.
Two years ago, Frances Haugen (who blew the whistle on Facebook) argued, “Congressional action is needed.” This was set in motion recently.
The White House published its “Blueprint for an AI Bill of Rights” last October, and the National Institute of Standards and Technology has released its “Artificial Intelligence Risk Management Framework.” Important players (such as OpenAI, Microsoft and Google) meet with Congress and the White House and float their own ideas for how their AI products should be regulated.
The main open questions are whether different sectors, such as health care and consumer products, should see different regulations and whether federal oversight should be provided by a single agency or more than one.
The European Union already released a draft of its regulatory framework, the “AI Act,” in April. It distinguishes regulation according to three risk categories: unacceptable risks (e.g., social scoring) are prohibited; high-risk AI (e.g., HR issues) or AI with specific transparency obligations (e.g., chatbots) are permitted but subject to regulation; minimal or no-risk AI is permitted without regulation.
Another option – related to or independent of regulation – is certification, or licensing, of AI products. My eggs may be certified as organic by the Food and Drug Administration, and my appliances are safe according to NSF/ANSI standards; why not have the same for AI products? The Institute of Electrical and Electronics Engineers–Standard Association released such a proposal – The Ethics Certification Program for Autonomous and Intelligent Systems – in 2020.
The cooperation between professional self-regulation and government oversight has been in place for biomedical research for quite some time now; it seems to be effective. It is too early to tell whether this is a model that data science or AI can and will emulate.
Bernd Buldt is professor of mathematical logic and the foundations of the exact sciences in the Department of Mathematical Sciences in the College of Science at Purdue University Fort Wayne. All views expressed are those of the author.