Research Statement
1. Introduction.
1.1. Topological data analysis. For the past 20 years Topological Data Analysis has been a vibrant area of research a lot due to the developments in applied and computational algebraic topology. Essentially it applies the qualitative methods of topology to problems of machine learning, data mining and computer vision. In particular, persistent homology is an area of mathematics interested in identifying a global structure by inferring high-dimensional structure from low-dimensional representations and studying properties of a often continuous space by the analysis of a discrete sample of it, assembling discrete points into global structure. When considering a notion of distance on the space, one gets a perspective of the space under different scales, where small features will eventually disappear. Persistence allows us to compute the homology at all scales, thereby giving us the ability to find ranges of scales where the structure of the space is stable. Techniques of persistence can be used to infer topological structure in data sets while certain variations on the method can be applied to study aspects of the shape of point clouds. By considering all possible scales, one can infer the correct scale at which to look at the point cloud simply by looking for scales where the persistent homology is stable. Multi-scale methods are thus enable to study the topology of point clouds as a route for approximating topological features of an unobservable geometric object generating samples. For these reasons, barcodes seen as multi-scale signatures are of great importance to the application of this research to the development of new techniques in machine learning. In [9], the authors study the ring of algebraic functions on the shape of persistence barcodes and apply this knowledge to pattern recognition. Such theory leads to topological data analysis particularly for large, high-dimensional data where topological methods allow us to transfer information from local to global structures. For good introductions to the field and its uses, I recommend [6], [13] and [29], and for an accessible introduction to algebraic topology I recommend [16].
1.2. Significance. Persistence is a field of study of recognized importance interested in problems relating to nonlinear systems, large scale data and development of more accurate models, having a large impact on: social media (with the work [10] of Hubert Wagner et al, implemented within Google Code, where they apply topological techniques to the analysis of text data, using the global geometric information given by persistence to describe the entire set of documents); robotics (with the work [20] of Danica Kragić and Florian Pokorny, adapting ideas from topology for application in robotic grasping and machine learning using shortest homology generators, winding numbers and Gauss linking integrals); medicine and cancer research (with the work [21] of Monica Nicolau et al, in the biology of disease by the development of novel techniques to tackle data analysis problems within a computational approach that combines geometric data transformations and topological methods); natural image statistics (with the work [7] of Carlsson et al, where the authors develop qualitative topological analysis of the local behavior of the space of natural images towards a theoretical model for the high-density 2-dimensional submanifold showing that it has the topology of the Klein bottle); machine learning in wikipedia (with the work [14] of H. Edelsbrunner et al, on persistence of self maps where the authors learn maps estimated locally to globally - when given wikipedia as a partial map, we can learn a map the entire space, having different languages, what can this map tells us?); and even on the new technology in archeology (with the work [24] of Robert Adler and Michael Wermen, that present an object descriptor for supervised classification based on the Euler characteristic of subsets created by thresholding a function defined over the domain at multiple levels). The enterprise Ayasdi is a software company that sells big data analytics technology to individuals looking to analyze high dimensional, high volume data sets. It applies topology to data analysis in the areas of life sciences, public sector, financial services, manufacturing, retail, telecom, and sports. Organizations and companies have used Ayasdi to uncover treatment for diseases, professional sports team analysis, pharmaceutical breakthroughs, and more.
1.3. Further applications. Much of the applications within the recent research require manipulation and comparison of persistence diagrams. Within this intention, persistence diagrams can be dealt as objects of a topos permitting new approaches to the study and a deeper understanding of the relationship between such structures. Their algebraic structure consisting of a complete Heyting algebra, discussed later in this document, enables the study of its general properties and permits the development of new algorithms. It is the main objective of the proposed project to deliver such applications, in areas of interest as machine learning or sensor networks, according to the current research within the community.
2. Problem.
2.1 Topos foundation. The topological information of a filtration can be encoded in a set of closed intervals, called a barcode, where is indicated the birth and death time of every connected component of the sample of a given topological space. This barcode can be represented by vertices in a plane where the first coordinate corresponds to the birth time and the second coordinate corresponds to the death time. A very good review on the subject is presented in [15], determining the setting of the further work in this project. Our first attempt was to consider as our Heyting algebra the collection of open intervals in R, with the analogy of union taken to be the smallest covering interval. I demonstrate in that this construction fails to be a distributive lattice. Instead, as our next and more successful attempt, I study the Heyting algebra of collections of vertices in a persistence diagram. In order to guarantee the axioms necessary to define a Heyting algebra I need to allow virtual persistence: points below the diagonal considered under the light of [12]. The order structure corresponds to the set inclusion of the correspondent bars in the barcode. Classical persistence will emerge as a subcategory of the resulting topos of sheaves. I am able to identify the arrow operation and the pseudo complements in this complete Heyting algebra. I have also studied its spectrum (i.e., its set of prime ideals) and their relation with the join-irreducible elements that permit us the study of the dual space of such algebra. It is not yet clear their relation with the ring underneath the spectral spaces correspondent to the dual space achieved and the ring of algebraic functions studied in [9].
2.2. Internal logic. In [28], Vejdemo-Johasson reviews in detail the algebraic foundations of persistent homology, and presents the idea of a topos-based approach as a potential unifying language for these various approaches. Such an approach encodes the various flavors of persistent homology as a topos of sheaves over some Heyting algebra encoding the shape of the persistence theory. The inspiration for our approach is to some extent rooted in the exposition by Barr and Wells [1], where sheaves of sets are described as sets with a temporally varying shape. The shape corresponds to the shape of the underlying site, which also describes the shape of the available truth values for the corresponding logic. Classical logic and set theory would correspond to having two discrete truth values; fuzzy logic to a continuum of truth values encoding reliability of a statement, and the persistent approach would encode truth as valid over some regions of a persistence parameter, but not other. Based on this as an inspirational source, I aim to develop the theory of persistent homology as the internal homology of simplicial complexes over a set theory in which elements have life-times; where elements of sets, simplices of simplicial complexes, and base vectors of vector spaces are born, live, and die, much like the language has evolved for discussing persistent homology.
2.3. Time sheaves. In such a setting, a filtered topological space corresponds to a topological space where parts of the space come in at later times; the construction of the homology functor immediately provides homology groups where elements come in and go away as time flows. In the proposed project, I explore such a topos-based approach in more detail, pursuing a formulation that would make the topos setting clear and amenable for generalization. Fundamental to such an approach is the formulation of an underlying space, a site, such that the sheaves of vector spaces over this site correspond to persistence modules. I am fundamentally interested in the algebraic foundations of applied and computational algebraic topology, in particular in foundations for the various flavors of persistent homology that have emerged. So far, the literature on persistent homology and its applications has been interested in standard persistent homology, multi-dimensional persistence and zig-zag persistence. For each of these cases, the recognition of an underlying algebraic structure has contributed both to the identification of new problems that could be solved and to the development of new algorithms.
3. Methodology.
3.1 From local to global. The standard tools in topology for tracking locally defined information attached to the open sets of a topological space and transferring it to a global scenario are studied by sheaf theory. These tools organize mathematical information enabling easy shifts in perspective between general, particular and even singular approaches to the same geometric objects - functions defined on parts of a geometric object are encoded as elements, and their restrictions and combination yielding larger or smaller domains form the prototype for the algebraic structure of a sheaf. Sheaves were originally developed as computational tools in geometry and topology, and their importance in contemporary algebraic geometry and algebraic topology is foundational. By viewing machine learning primitives as these geometric objects, for instance, sheaves enable us to model locally varying machine learning tasks and thereby approach an encoding of subtly inter-related local models. Sheaf Theory has had a great contribution to the study of geometric structures such as that of a scheme, enabling it to be expressed in terms of a sheaf of rings on the space. Also, sheafs provide us with a general cohomology theory that encompasses a powerful link between topological and geometric properties of spaces, measuring among other things the failure modes to being able to extend a locally defined element to a globally defined one. The Bernshtein-Gelfand-Gelfand correspondence yields an approach to computing sheaf cohomology by translating the problem away from sheaves and into working with modules over a particular ring, where Grobner basis approaches and classical computational commutative algebra work.
3.2. Topos unification. Behaving as the category of sheaves of sets on a topological space, a topos is a cartesian closed category with enough structure to produce an object of subobjects for each object. Topoi are of interest for the purpose of modeling computation as subobjects need not have complement. They have been proposed as an alternative to the category of sets for the foundation of Mathematics and most ways of defining the category of fuzzy sets lead to a category which can be embedded in a topos. Some authors believe that topos theory offers methodologies for comparing different areas of study in mathematics (as in [5]) providing effective means for transferring results and techniques between distinct fields. Hence, its fundamental role in the unification of mathematics, illuminating the nature of concepts, establishing new theorems, suggesting wider examples and promoting new lines of investigation. A fundamental result in quantum mechanics shows the usefulness of sheaves for probabilistic reasoning in this area: the no-go theorem by Kochen-Specker, constraining the kinds of hidden variable theories that can model quantum mechanical processes. Abramsky and Brandenburger were able to generalize this theorem using a sheaf-based representation modeling contextually in quantum mechanics by sheaf-theoretic obstructions.
3.3. Lattice ordered structures. Transversal to most of the areas of study in Mathematics is the analysis of the order structure. Posets are frequent objects of study in topology due to their potential for topological combinatorics (when considering the open sets to be the upper sets of a poset aiming for the construction of the Alexandrov topology). A poset is a lattice when all pairs of elements in the poset have a infimum and a supremum (greatest lower bound and least upper bound, respectively). Indeed, these are minimal conditions for a poset to acquire an algebraic structure with two associative and commutative operations that have a certain absorption relation between them. Lattices are present in the everyday life of a working mathematician being distributive lattices some of the most important varieties of these algebras. Examples of lattices are abundant as the power set of A ordered by subset inclusion or the collection of all partitions of A ordered by refinement. A recent approach to the study of persistent homology using techniques of lattice theory is presented in [27] where several algorithmic applications imply the impact of these strategies.
3.4. Duality. Boolean algebras are bounded particular cases of such, in which one can consider a complement for each element. Distributivity guarantees its uniqueness. These distributive lattices are very well behaved capturing essential properties of both set operations and logic operations. Heyting algebras are more general structures where sometimes the complement fails to exist: its correspondent is named pseudo complement coinciding with the complement when the lattice is a Boolean algebra. The lattice of open sets of a topological space X forms a Heyting algebras. The category of sheaves on a Heyting algebra is a topos (cf. [17]). Distributive skew lattices are sheaves over distributive lattices (cf. [2]) and their structure has been studied in [19]. Furthermore, every Boolean algebra is isomorphic to a field of sets (named Stone duality and studied in [17]), and any Heyting algebra is isomorphic to an algebra of opens (named Esakia duality and studied in [3]). This categorical equivalence permit us solving problems in one framework and transferring the results to the other. It has been proved very useful, for instance, for the proof of Tychonoff theorem without using choice: transferring the problem from topological spaces over to the algebraic side and using locales, then one can show that a product of compact locales is compact, without choice (cf. [17]).
4. Outcomes.
4.1. Algorithmic constructions. A topos theoretic strategy consists of a framework of recognized interest due to its achievements in several other areas as earlier described. With this framework we intend to develop algorithms intending to build structures on persistent diagrams and explore several experiments using already existing packages based in C++ and Python, using concrete data sets as wikipedia or sensor network data.. New approaches and better understanding of algorithmic constructions for persistence are made possible. A similar work regarding commutative diagrams of vector spaces and linear maps that appear in the study of persistence lead the author, in collaboration with P. Škraba, to several relevant applications in their published preprint [27]. This hints us for the usefulness of the study of order structure and its consequences to general algorithmic applications in the context of persistence.
4.2. Applications to data analysis. With these achievements, we focus the development of applications mainly to model systems (in a complementary development to the work M. Robinson in [25]) and machine learning problems (in a collaboration work with M. Vejdemo-Johansson as in [28]). Robinson is interested in the influence of sheaf theory for a deeper understanding of sensor networks and their applications, studying complicated problems on small networks, i.e. locally defined structures, transferring their information to globally valid inferences by way of consistency relations. On the other hand, working in a machine learning context, Vejdemo-Johansson uses sheaves to take compatible local descriptions and glue them together to a global cohesive picture, aiming to reach an algebraic method for finding out which global pictures are possible to combine from local pieces.
4.3. Universal laws for persistence. Constructing the topos of sheaves over the complete Heyting algebra of bars and describing its unifying properties will lead us to establish a logic and inference to the study of persistence. Having achieved a complete Heyting algebra as a base structure to the research, the analysis of its arrow operation is of great value leading us to several general laws to be applied everywhere. The internal logic of any elementary topos, studied in [17] and [1], is based on Heyting algebras of subobjects of the terminal object ordered by inclusion. The universal constructions established for these distributive lattices permit us a great deal of algorithmic applications.
4.4. Topological space of barcodes. The topological information that can be extracted by the usage of duality theory over the Heyting algebra of a persistence diagram is an original approach to the classical study of the topological features by barcodes. It is an established connection between algebra and topology (cf. [3] and [8]), and it has proved its efficiency over the last 50 years. Its usage to solve problems of one area and transfer the knowledge to the other area is of great interest, in particular to my intent of a global perspective on persistence. It is well known that the sheaves over distributive lattices consist of skew lattices (cf. [2]), a non-commutative version of lattices studied by this author in [19] as well as in his Ph.D. in [22]. I believe that this knowledge is of great importance to the further research of these matters.
4.5. New problems. Consequences for lattice theory are evident, not only with the development of clear applications that permit the creation of new problems within the theory and motivate new strategies and ideas, but also with the implementation of new techniques in lattice theoretic proofs (as the usage of short exact sequences for the proof of distributivity in [27]).
5. References.
[1] M. Barr and C. Wells. Category theory for computing science, volume 10. Prentice Hall, 1990.
[2] A. Bauer, K. Cvetko-Vah, M. Gehrke, S. J. van Gool, and G. Kudryavtseva. A non-commutative Priestley duality. Topology and its Applications, 2013.
[4] G. Birkhoff. Lattice theory, volume 5. AMS Colloquium Publications, Providence RI, third edition, 1940.
[5] O. Caramello. The unification of Mathematics via Topos Theory. arXiv: 1006.3930, 2010.
[6] G. Carlsson. Topology and data. Bulletin-American Mathematical Society, 46(2):1{54, 2009.
[7] G. Carlsson, T. Ishkhanov, V. de Silva and A. Zomorodian. On the local behavior of spaces of natural images. Int. J. Comput. Vis., 76:1{12, 2008.
[8] B. A. Davey and H. A. Priestley. Introduction to lattices and order. Cambridge University Press, 2002.
[9] Aaron Adcock, Erik Carlsson and Gunnar Carlsson. The Ring of Algebraic Functions on Persistence Bar Codes. arXiv:1304.0530, 2013.
[10] H. Wagner, P. Dlotko and M. Mrozek. Computational topology in text mining. in: Computational Topology in Image Context, 4th International Workshop, CTIC 2012, Bertinoro, Italy, May 28-30, 2012. Proceedings. published version in: Lecture Notes in Computer Science 7309, 68-78, DOI: 10.1007/978-3-642-30238-1-8, 2012.
[11] C. J. Isham and J. Butterfield Some possible roles for topos theory in quantum theory and quantum gravity. Journal of Foundations of Physics, 30(10):17071735, 2000.
[12] H. Edelsbrunner P. Bendich, S. Cabello. A point calculus for interlevel set homology. Pattern Recognition Letters, 33(11):1436{1444, 2012.
[13] H. Edelsbrunner and J. L. Harer. Computational Topology: An Introduction. American Mathematical Society, 2010.
[14] H. Edelsbrunner, G. Jablonski and M. Mrozek. The Persistent Homology of a Self-map. Preprint.
[15] R Ghrist. Barcodes: the persistent topology of data. Bulletin-American Mathematical Society, 45(1): 61, 2008.
[16] A. Hatcher. Algebraic Topology. Hatcher, December 2000.
[17] P. T. Johnstone. Stone Spaces. Cambridge University Press, 1986.
[18] P. T. Johnstone. Sketches of an elephant: a topos theory compendium. Oxford Science Publications, 2002.
[19] J. Leech M. Kinyon and J. Pita Costa. Distributive skew lattices. Submitted to Semigroup Forum, 2013.
[20] J. A. Stork, F. T. Pokorny and D. Kragić. A Topology-based Object Representation for Clasping, Latching and Hooking. In: IEEE-RAS International Conference on Humanoid Robots (HUMANOIDS), accepted/to appear, 2013.
[21] M. Nicolau, A. Levine and G. Carlson. Topology based data analysis identities a subgroup of breast cancers with a unique mutational profile and excellent survival. Proc. Natl. Acad. Sci. USA, 108(17):7265{7270, 2011.
[24] E. Richardson and M. Werman. Efficient Classification using Euler Characteristic. Manuscript, August 2013.
[25] M. Robinson. The Nyquist theorem for cellular sheaves. arXiv: 1307.7212, August 2013.
[26] P. Škraba and M. Vejdemo-Johansson. Persistence modules: Algebra and algorithms. arXiv: 1302.2015, February 2013.
[27] P. Škraba and J. Pita Costa. A lattice for persistence. arXiv: 1307.4192, July 2013.
[28] M. Vejdemo-Johansson. Sketches of a platypus: persistent homology and its algebraic foundations. arXiv:1212.5398, November 2012.
[29] A. J. Zomorodian. Topology for computing, volume 16. Cambridge University Press, 2005.