Artificial Intelligence, Data Science, and Transformation
Data science is being transformed.
Artificial intelligence, while generating powerful tools for analysis, is only the beginning of a more ambitious phase making AI systems more accurate, less biased, and effective prediction tools.
It Was the Cloud
Computing power, driven by Moore’s Law, went a long way. Computing power and local processing were the initial bottlenecks, but these have mostly been overcome. The next computing phase, cloud computing (originally, distributed file systems were a form of cloud computing where files weren’t located on a specific machine but were on some file server somewhere), puts both data and computing away from the computer user – your data, processing, and software tools are somewhere else.
Now it’s the Data – and Finding Value Somewhere in the Haystack
Fixations on processing cycles, CPUs, GPUs, and parallel servers have not paid enough attention to the tsunami of data that now needs to be accessed and processed. Extracting value from data is what gives the entire data gathering and processing industry meaning – and value.
Gathering more and more raw data does not create value. One cannot simply push a button and have valuable output generated. Data needs to be collected, processed, stored, managed, analyzed, and visualized – only then can we begin to interpret the results. Each step is challenging, and every step in this cycle requires massive amounts of work and value-added tools.
Moreover, every step in the cycle needs to be secure, reliable, available, private, and usable. Each function has its own steps and complexities, but it’s only weaving AI tools through this entire cycle that enables the extraction of usable data. Formal methods exist for software robustness, security protocols, and privacy, but we are only at the beginning of the cycle.
Robustness, Comprehensiveness, and Fairness
For a system to be robust, outcomes cannot be reliant on small bits of input. We are still challenged to deliver the kind of thoroughness that enables a robust solution to also be comprehensive. In other words, data analysis generates usable output, and that output can be interpreted without built-in bias.
It’s All Still Probabilities
Artificial intelligence systems are probabilistic. Computing systems, by contrast, are deterministic – either on or off, true or false, yes or no, zero or one. AI outputs are probabilities. Data systems are showing the probabilities of a certain output, whether it’s a medical diagnosis, a likely set of next steps in an algorithm, or the optimal choices in a decision-making process or recommendation engine (of course, anyone with a Netflix or Amazon account knows how far we need to go when it comes to the effectiveness of “recommendation engines” that are mostly laughably useless).
As AI permeates all systems and processes, we increasingly live in a world of probabilities. Mathematically, it’s using probabilistic logic and bringing a lot of statistics and stochastic reasoning to bear onto mountains of data. Computer science is challenged to think like this. AI systems complicate formal reasoning and system design, and good solutions have not yet been developed adequately.
What Causes What?
Causality – identifying the inputs processed through AI and machine learning algorithms that generate specific outcomes may be the single greatest challenge of data science and data systems.
Machine learning algorithms and models find patterns, correlations, and associations. But they do not identify causality. Nor are the systems very good at understanding what changes will then generate what outcomes with dependable accuracy.
Inference and reasoning take on new prominence because of this. Traditionally, statistical analysis attempted to understand causality, and these algorithms are fairly well understood. However, while these contributions are fundamental, swarms of data and causal reasoning applied to computer science is still new ground requiring innovation and more thoughtful algorithms and processes.
Data science is transforming all areas, not just academic fields, but data science methods applied to digital data generating, producing, collecting, analyzing, etc. permeates all areas of modern life, and it will from now on.
Perhaps It’s Time to Think Like a Computer Scientist?
Thinking about these topics requires building on the power and limits of computing processes, whether executed by a human or by a machine. Computational methods and models give us the ability to solve problems and design systems capable of addressing problems no single researcher could tackle.
Computational thinking (as defined by Jeanette Wing of Carnegie Mellon) confronts the riddle of machine intelligence: What can humans do better than computers and vice versa. More fundamentally it addresses the question: What is computable?
Isn’t Everything Computable?
Yes, but the results are not necessarily meaningful.
Solving complex and seemingly intractable problems requires designing innovative data systems while simultaneously understanding human behavior. That’s no small trick. Computer science is an essential tool with the theoretical underpinnings to answer challenging questions precisely.
Understanding the problem as accurately as possible, defining not only the issues but the difficulty in solving it highlights the underlying power of computer science. It uses a computing device and a system that can generate a solution not otherwise possible via human computation.
But, the machine’s instruction set, its resource constraints, and its operating environment impose limitations. For example, are we getting only approximations, are we finding patterns in randomness, do we have false positives, do we really know how to solve this or are we simply generating an algorithm that gives “an answer” but not “the answer?” Many business models portray AI as a solution or even a product. It is neither. It as a tool that can be misused, inaccurate, insufficient, as well as an extraordinary development enabling otherwise unachievable results.
Is This Dangerous?
There are both virtues and dangers to analysis that may lack appropriate context or may be generating conclusions that are statistically relevant, but not necessarily providing the true accuracy. One of the critical dangers of computer science is that it generates a precise answer – we just don’t know if it’s accurate.
We are often using abstraction and decomposition when addressing a large complex task or designing a large complex system. Often, it is choosing an appropriate representation for a problem or modeling the relevant aspects of a problem to make it tractable. Otherwise, solutions seem out of reach, but in order for intractable problems to become tractable, systems must be sustained, safe, modifiable, and applicable to large complex sets of data with otherwise enormous and unreachable solutions.
In other words, we have a long way to go.
Protein Folding, Anyone?
However, we are seeing spectacular results, and we are at the vanguard of even greater advancements in knowledge, its applications, and pervasive benefits.
The most recent example is Deep Mind’s protein folding prediction AI. Since the human body does something only because a protein told it, understanding protein creation and function would be an extraordinary medical breakthrough. Innumerable proteins are formed from a combination of 20 amino acids. This much is well understood by any life-sciences researcher. However, how these amino acids combine to form these proteins, and how these proteins fold and combine to form an almost infinite number of protein signals is a mystery beyond any human comprehension and analysis. But Deep Mind’s AI system and enormous computing power generated accurate predictions, even beyond the imagination of life-sciences researchers.
And That’s Just the Beginning
This is the beginning of the kind of AI applications that can permeate all industries and even our personal lives. It is this kind of thinking that is giving solutions to otherwise far-fetched concepts and seemingly impossible, and intractable problems to solve. Now, the true quality of the solution needs to be tested and retested. The advantage of using life-sciences initially is that we can see and test the accuracy in a lab. But, many dimensions of human behavior, policy, and predictability for most other aspects of human nature and our everyday lives remain elusive.
The scientific method insists we “see first.” Then test and verify. But what have we actually seen? How can we test it? And, how capable are we of verifying something that uses essentially a set of mysterious tools to generate its outcome. How can we turn around and verify that with any other methodology? Right now, that is impossible.
We have the opportunity for enormous progress in understanding and predicting, but we have to see it through a filter of increasing limitations on how we can test the probabilities being generated and verify the authenticity of the results. Our computational approach may generate increasingly precise answers with increasing uncertainty and less accuracy.
A New Way to Think
Algorithms and algorithmic reasoning (filtering cause and effect through a formula) can be an effective way to discover solutions, learn new pathways, manage uncertainty, think creatively, and generate useful output (as in the case of Deep Mind’s protein folding predictions) while using increasingly massive amounts of data and processing power.
Machine learning is being used for problems on a scale, in terms of both data size and dimension, unimaginable only a few years ago. As we have seen, artificial intelligence’s contribution to biology goes beyond the ability to search through vast amounts of sequence data looking for patterns. Data structures, algorithms, and computational abstractions represent the structure of proteins in ways that elucidate their function, changing the way biologists think, and generating a massive leap forward for life sciences.
Similarly, these tools applied to game theory are changing the way economists think; nanocomputing, the way chemists think; and quantum computing, the way physicists think. This will be part of the skill set of not only other scientists but of everyone else. Ubiquitous computing is here today in the form of advanced artificial intelligence and machine learning systems. It is impacting every daily function with greater pervasiveness.
Thinking Like a Human – and a Computer Scientist
Computer science is not computer programming, and it means more than being able to program a computer. It requires thinking at multiple levels of abstraction and is the way humans solve problems. While the systems and processes are not trying to be human (nor are they trying to get humans to think like computers), it is human interaction with computer science-based tools and processes that enable our greatest progress.
Computers are dull and boring; humans are clever and imaginative. Humans make computers exciting. Equipped with computing devices, we use our cleverness to tackle problems we would not dare take on before the age of computing, and we can build systems with functionality limited only by our imaginations.
It’s not just the software and hardware artifacts we produce that will be physically present everywhere and touch our lives all the time, it will be the computational concepts we use to approach and solve problems, manage our daily lives, and communicate and interact with other people. It will be a reality when it is so integral to our lives it disappears.
The problems and solutions we address are limited only by our own curiosity and creativity.