## Learning, search, social computation and networks

### 2013

##### Abstract

In the practice of information extraction, the input data are usually
arranged into *pattern matrices*, and analyzed by the methods of
linear algebra and statistics, such as principal component
analysis. In some applications, the tacit assumptions of these methods
lead to wrong results. The usual reason is that the matrix composition
of linear algebra presents information as flowing in waves, whereas it
sometimes flows in particles, which seek the shortest paths. This
wave-particle duality in computation and information processing has
been originally observed by Abramsky. In this paper we pursue a
particle view of information, formalized in *distance spaces*,
which generalize metric spaces, but are slightly less general than
Lawvere's generalized metric spaces. In this framework, the
task of extracting the 'principal components' from a given matrix of
data boils down to a *bicompletion*, in the sense of enriched
category theory. We describe the bicompletion construction for
distance matrices. The practical goal that motivates this research is
to develop a method to estimate the hardness of attack constructions
in security.

### 2012

##### Abstract

Formal Concept Analysis (FCA) begins from a
context, given as a binary relation between some objects and some
attributes, and derives a lattice of concepts, where each concept is
given as a set of objects and a set of attributes, such that the first
set consists of all objects that satisfy all attributes in the second,
and vice versa. Many applications, though, provide contexts with
quantitative information, telling not just whether an object satisfies
an attribute, but also quantifying this satisfaction. Contexts in
this form arise as rating matrices in recommender systems, as
occurrence matrices in text analysis, as pixel intensity matrices in
digital image processing, etc. Such applications have attracted a lot
of attention, and several numeric extensions of FCA have been
proposed. We propose the framework of *proximity sets (proxets)*,
which subsume partially ordered sets (posets) as well as metric
spaces. One feature of this approach is that it extracts from
quantified contexts quantified concepts, and thus allows full use of
the available information. Another feature is that the categorical
approach allows analyzing any universal properties that the classical
FCA and the new versions may have, and thus provides structural
guidance for aligning and combining the approaches.

### 2010

##### Abstract

In a previous FAST paper, I presented a
quantitative model of the process of trust building, and showed that
trust is accumulated like wealth: the rich get richer. This explained
the pervasive phenomenon of adverse selection of trust certificates,
as well as the fragility of trust networks in general. But a simple
explanation does not always suggest a simple solution. It turns out
that it is impossible to alter the fragile distribution of trust
without sacrificing some of its fundamental functions. A solution for
the vulnerability of trust must thus be sought elsewhere, without
tampering with its distribution. This observation was the starting
point of the present paper. It explores a different method for
securing trust: not by redistributing it, but by mining for its
sources. The method used to break privacy is thus also used to secure
trust. A high level view of the mining methods that connect the two is
provided in terms of *similarity networks*, and *spectral
decomposition* of similarity preserving maps. This view may be of
independent interest, as it uncovers a common conceptual and
structural foundation of mathematical classification theory on one
hand, and of the spectral methods of graph clustering and data mining
on the other hand.

### 2009

##### Abstract

Game theoretic equilibria are mathematical expressions of
rationality. Rational agents are used to model not only humans and
their software representatives, but also organisms, populations,
species and genes, interacting with each other and with the
environment. Rational behaviors are achieved not only through
conscious reasoning, but also through spontaneous stabilization at
equilibrium points.
Formal theories of rationality are usually guided by informal
intuitions, which are acquired by observing some concrete economic,
biological, or network processes. Treating such processes as instances
of computation, we reconstruct and refine some basic notions of
equilibrium and rationality from the some basic structures of
computation.

It is, of course, well known that equilibria arise as fixed points;
the point is that semantics of computation of fixed points seems to be
providing novel methods, algebraic and coalgebraic, for reasoning
about them.

### 2008

##### Abstract

Trust is often conveyed through delegation, or through
recommendation. This makes the trust authorities, who process and
publish trust recommendations, into an attractive target for
attacks and spoofing. In some recent empiric studies, this was shown
to lead to a remarkable phenomenon of * adverse selection*: a
greater percentage of unreliable or malicious web merchants were found
among those with certain types of trust certificates, then among those
without. While such findings can be attributed to a lack of diligence
in trust authorities, or even to conflicts of interest, our analysis
of trust dynamics suggests that the public trust networks would
probably remain vulnerable even if the trust authorities were
perfectly diligent. The reason is that the main processes of trust
building, under reasonable modeling assumptions, naturally lead to
power-law distributions of trust: the rich get richer, the trusted
attract more trust. The evolutionary processes governed by such
distributions, ubiquitous in nature, are known to be robust with
respect to random failures, but vulnerable to adaptive attacks. We
recommend some ways to decrease this vulnerability, and suggest some
ideas for exploration.

##### Abstract

We explore a simple mathematical model of
network computation, based on Markov chains. Similar models apply to a
broad range of computational phenomena, arising in networks of
computers, as well as in genetic, and neural nets, in social networks,
and so on. The main problem of interaction with such spontaneously
evolving computational systems is that the data are not uniformly
structured. An interesting approach is to try to extract the
semantical content of the data from their distribution among the
nodes. A concept could then be identified by finding the community of
nodes that share it. The task of data structuring is thus reduced to
the task of finding the network communities, as groups of nodes that
together perform some non-local data processing. Towards this goal, we
extend the ranking methods from nodes to paths. This allows us to
extract some information about the likely flow biases from the
available static information about the network.

##### Abstract

Originally, quantum probability theory was
developed to analyze statistical phenomena in quantum systems, where
classical probability theory does not apply, because the lattice of
measurable sets is not necessarily distributive. On the other hand, it
is well known that the lattices of concepts, that arise in data
analysis, are in general also non-distributive, albeit for completely
different reasons. In his recent book, van Rijsbergen (2004) argues
that many of the logical tools developed for quantum systems are also
suitable for applications in information retrieval. I explore the
mathematical support for this idea on an abstract vector space model,
covering several forms of data analysis (information retrieval, data
mining, collaborative filtering, formal concept analysis...), and
roughly based on an idea from categorical quantum mechanics (Abramsky
& Coecke 2004; Coecke & Pavlovic 2007). It turns out that quantum
(i.e., noncommutative) probability distributions arise already in this
rudimentary mathematical framework. We show that a Bell-type
inequality (Bell 1964) must be satisfied by the standard similarity
measures, if they are used for preference predictions. The fact that
already a very general, abstract version of the vector space model
yields simple counterexamples for such inequalities seems to be an
indicator of a genuine need for quantum statistics in data analysis.