Big Math Data: possibilities and challenges

Jinfang Wang

Institute: Department of Mathematics and Informatics, Graduate School of Science, Chiba University, Japan.

Institute: Department of Mathematics and Informatics, Graduate School of Science, Chiba University, Japan.

2015/08/05 Wed 4PM-4:50PM

The computer has influenced all kinds of sciences, with mathematical sciences being no exception. Mathematicians have been looking for a new foundation of mathematics replacing ZFC (Zermelo-Fraenkel set theory with the axiom of choice) and category theory, both of which have been successful to a great extent. Indeed, a theory, known as Type Theory, is rising up as a powerful alternative to all these traditional foundations. In type theory, any mathematical object is represented as a type.

Various formal proof systems, including HOL, Isabelle, Idris, Coq, Agda, are based on this theory. Thanks to this new theory, it is becoming a reality that mathematical reasoning can indeed be digitized. Philosophers, logicians, computer scientists, and mathematicians as well, have been making a great deal of efforts and progresses to formalize various mathematical theories. Recent breakthroughs include, but not limited to, the computer-verified proofs of the Four Color Theorem (2004), the Feit Thomson Theorem (2012), and the Kepler Conjecture (2014).

To formalize the proofs of these theorems, large amount of mathematical theories have been digitized and stored in the form of libraries (analogies of R libraries familiar to our statisticians). For instance, the formal proof of the Feit Thomson Theorem had involved 170,000 lines of codes with more than 15,000 definitions and 4,200 lemmas. These large data, referred to as Big Math Data hereafter, open a new paradigm and present serious challenges for statisticians to analyze a totally different type of data we have never experienced before, namely the mathematical theories. The right figure shows some libraries which form SSReflect, an extension of the interactive theorem prover Coq. There are many other libraries available as the results produced in the process of formalizations of various mathematical theories.

In this talk, I shall give a gentle introduction to Big Math Data, and describe the possible mathematical and statistical challenges for both obtaining and analyzing Big Math Data.

Various formal proof systems, including HOL, Isabelle, Idris, Coq, Agda, are based on this theory. Thanks to this new theory, it is becoming a reality that mathematical reasoning can indeed be digitized. Philosophers, logicians, computer scientists, and mathematicians as well, have been making a great deal of efforts and progresses to formalize various mathematical theories. Recent breakthroughs include, but not limited to, the computer-verified proofs of the Four Color Theorem (2004), the Feit Thomson Theorem (2012), and the Kepler Conjecture (2014).

To formalize the proofs of these theorems, large amount of mathematical theories have been digitized and stored in the form of libraries (analogies of R libraries familiar to our statisticians). For instance, the formal proof of the Feit Thomson Theorem had involved 170,000 lines of codes with more than 15,000 definitions and 4,200 lemmas. These large data, referred to as Big Math Data hereafter, open a new paradigm and present serious challenges for statisticians to analyze a totally different type of data we have never experienced before, namely the mathematical theories. The right figure shows some libraries which form SSReflect, an extension of the interactive theorem prover Coq. There are many other libraries available as the results produced in the process of formalizations of various mathematical theories.

In this talk, I shall give a gentle introduction to Big Math Data, and describe the possible mathematical and statistical challenges for both obtaining and analyzing Big Math Data.