Research

Testing Tail Weight of a Distribution Via Hazard Rate [pdf]

Maryam Aliakbarpour, Amartya Shankha Biswas, Kavya Ravichandran, Ronitt Rubinfeld

International Conference on Algorithmic Learning Theory (ALT) 2023

Understanding the shape of a distribution of data is of interest to people in a great variety of fields, as it may affect the types of algorithms used for that data. Given samples from a distribution, we seek to understand how many elements appear infrequently, that is, to characterize the tail of the distribution. We develop an algorithm based on a careful bucketing scheme that distinguishes light-tailed distributions from non-light-tailed ones via a definition based on the hazard rate under some natural smoothness and ordering assumptions. We verify our theoretical results empirically.

Local Access to Random Walks [pdf]

Amartya Shankha Biswas, Edward Pyne, Ronitt Rubinfeld

Innovations in Theoretical Computer Science Conference (ITCS) 2022

For a graph \(G\) on \(n\) vertices, naively sampling the position of a random walk of at time \(t\) requires work \(\Omega(t)\). We desire local access algorithms supporting \(position(t)\) queries, which return the position of a random walk from some fixed start vertex \(s\) at time \(t\), where the joint distribution of returned positions is \(1/poly(n)\) close to those of a uniformly random walk in \(\ell_1\) distance.

We first give an algorithm for local access to random walks on a given undirected \(d\)-regular graph with \(\widetilde{O}(\frac{1}{1-\lambda}\sqrt{n})\) runtime per query, where \(\lambda\) is the second-largest eigenvalue of the random walk matrix of the graph in absolute value. Since random \(d\)-regular graphs \(G(n,d)\) are expanders with high probability, this gives an \(\widetilde{O}(\sqrt{n})\) algorithm for a graph drawn from \(G(n,d)\) whp, which improves on the naive method for small numbers of queries.

We then prove that no algorithm with subconstant error given probe access to an input \(d\)-regular graph can have runtime better than \(\Omega(\sqrt{n}/\log(n))\) per query in expectation when the input graph is drawn from \(G(n,d)\), obtaining a nearly matching lower bound. We further show an \(\Omega(n^{1/4})\) runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance).

We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime \(\text{polylog}(n)\) per query. This also allows for efficient local access to walks on \(polylog\) degree expanders. We show that our techniques apply to graphs with high degree by extending or results to graphs constructed using the tensor product (giving fast local access to walks on degree \(n^\epsilon\) graphs for any \(\epsilon \in (0,1]\)) and Cartesian product.

Towards a decomposition-optimal algorithm for counting and sampling arbitrary motifs in sublinear time [pdf]

Amartya Shankha Biswas, Talya Eden, Ronitt Rubinfeld

APPROX-RANDOM 2021

We consider the problem of sampling and approximately counting an arbitrary given motif \(H\) in a graph \(G\), where access to \(G\) is given via queries: degree, neighbor, and pair, as well as uniform edge sample queries. Previous algorithms for these tasks were based on a decomposition of \(H\) into a collection of odd cycles and stars, denoted \(D^*(H) = {O_{k_1},...,O_{k_q}, S_{p_1},...,S_{p_l}}\). These algorithms were shown to be optimal for the case where \(H\) is a clique or an odd-length cycle, but no other lower bounds were known. We present a new algorithm for sampling arbitrary motifs which, up to \(poly(log n)\) factors, is always at least as good, and for most graphs \(G\) is strictly better. The main ingredient leading to this improvement is an improved uniform algorithm for sampling stars, which might be of independent interest, as it allows to sample vertices according to the \(p\)-th moment of the degree distribution. Finally, we prove that this algorithm is decomposition-optimal for decompositions that contain at least one odd cycle. These are the first lower bounds for motifs \(H\) with a nontrivial decomposition, i.e., motifs that have more than a single component in their decomposition.

Massively Parallel Algorithms for Small Subgraph Counting [pdf]

Amartya Shankha Biswas, Talya Eden, Quanquan C. Liu, Slobodan Mitrović, Ronitt Rubinfeld

APPROX-RANDOM 2022

With the prevalence of large graphs, it is becoming increasingly important to design scalable algorithms. Over the last two decades, frameworks for parallel computation, such as MapReduce, Hadoop, Spark and Dryad, gained significant popularity. The Massively Parallel Computation (MPC) model is a de-facto standard for studying these frameworks theoretically. Subgraph counting is a fundamental problem in analyzing massive graphs, often studied in the context of social and complex networks. There is a rich literature on designing scalable algorithms for this problem, with the main challenge to design methods which are both efficient and accurate. In this work, we tackle this challenge and design several new algorithms for subgraph counting in MPC.

Given a graph \(G\) over \(n\) vertices, \(m\) edges and \(T\) triangles, our first main result is an algorithm that, with high probability, outputs a \((1+\epsilon)\)-approximation to \(T\), with asymptotically optimal round and total space complexity provided any \(S \geq \max{(\sqrt m, n^2/m)}\) space per machine and assuming \(T=\Omega(\sqrt{m/n})\).

Our second main result is an \(O(\log \log n)\)-rounds algorithm for exactly counting the number of triangles, parametrized by the arboricity \(\alpha\) of the input graph. The space per machine is \(O(n^{\delta})\) for any constant \(\delta\), and the total space is \(O(m\alpha)\), which matches the time complexity of (combinatorial) triangle counting in the sequential model. We also prove that this result can be extended to exactly counting \(k\)-cliques for any constant \(k\), with the same round complexity and total space \(O(m\alpha^{k-2})\). Alternatively, allowing \(O(\alpha^2)\) space per machine, the total space requirement reduces to \(O(n\alpha^2)\).

Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most \(5\), can be implemented in the MPC model in \(\tilde{O}_{\delta}(\sqrt{\log n})\) rounds, \(O(n^{\delta})\) space per machine and \(O(m\alpha^3)\) total space. Therefore, this result also exhibits the phenomenon that a time bound in the sequential model translates to a space bound in the MPC model.

Massively Parallel Algorithms for Distance Approximation and Spanners [pdf]

Amartya Shankha Biswas, Michal Dory, Mohsen Ghaffari, Slobodan Mitrović, Yasamin Nazari

Symposium on Parallelism in Algorithms and Architectures (SPAA) 2021

Over the past decade, there has been increasing interest in distributed/parallel algorithms for processing large-scale graphs. By now, we have quite fast algorithms—usually sublogarithmic-time and often \(\poly(\log\log n)\)-time, or even faster—for a number of fundamental graph problems in the massively parallel computation (\(\texttt{MPC}\)) model. This model is a widely-adopted theoretical abstraction of MapReduce style settings, where a number of machines communicate in an all-to-all manner to process large-scale data. Contributing to this line of work on \(\texttt{MPC}\) graph algorithms, we present \(poly(\log k) \in poly(\log\log n)\) round \(\texttt{MPC}\) algorithms for computing \(O(k^{1+{o(1)}})\)-spanners in the strongly sublinear regime of local memory. To the best of our knowledge, these are the first sublogarithmic-time \(\texttt{MPC}\) algorithms for spanner construction.

As primary applications of our spanners, we get two important implications, as follows:

For the \(\texttt{MPC}\) setting, we get an \(O(\log^2\log n)\)-round algorithm for \(O(\log^{1+o(1)} n)\) approximation of all pairs shortest paths (APSP) in the near-linear regime of local memory. To the best of our knowledge, this is the first sublogarithmic-time \(\texttt{MPC}\) algorithm for distance approximations.

Our result above also extends to the \clique model of distributed computing, with the same round complexity and approximation guarantee. This gives the first \emph{sub-logarithmic} algorithm for approximating APSP in \emph{weighted} graphs in the \clique model.

Local Access to Huge Random Objects Through Partial Sampling [pdf]

Amartya Shankha Biswas, Ronitt Rubinfeld, Anak Yodpinyanee

Innovations in Theoretical Computer Science Conference (ITCS) 2020

Consider an algorithm performing a computation on a huge random object (for example a random graph or a “long” random walk). Is it necessary to generate the entire object prior to the computation, or is it possible to provide query access to the object and sample it incrementally “on-the-fly” (as requested by the algorithm)? Such an implementation should emulate the random object by answering queries in a manner consistent with an instance of the random object sampled from the true distribution (or close to it). This paradigm is useful when the algorithm is sub-linear and thus, sampling the entire object up front would ruin its efficiency.

Our first set of results focus on undirected graphs with independent edge probabilities, i.e. each edge is chosen as an independent Bernoulli random variable. We provide a general implementation for this model under certain assumptions. Then, we use this to obtain the first efficient local implementations for the Erdos-Renyi \(G(n,p)\) model for all values of \(p\), and the Stochastic Block model. As in previous local-access implementations for random graphs, we support Vertex-Pair and Next-Neighbor queries. In addition, we introduce a new Random-Neighbor query. Next, we give the first local-access implementation for All-Neighbors queries in the (sparse and directed) Kleinberg’s Small-World model. Our implementations require no pre-processing time, and answer each query using \(\mathcal{O}(poly(\log n))\) time, random bits, and additional space.

Next, we show how to implement random Catalan objects, specifically focusing on Dyck paths (balanced random walks on the integer line that are always non-negative). Here, we support Height queries to find the location of the walk, and First-Return queries to find the time when the walk returns to a specified location. This in turn can be used to implement Next-Neighbor queries on random rooted ordered trees, and Matching-Bracket queries on random well bracketed expressions (the Dyck language).

Finally, we introduce two features to define a new model that: (1) allows multiple independent (and even simultaneous) instantiations of the same implementation, to be consistent with each other without the need for communication, (2) allows us to generate a richer class of random objects that do not have a succinct description. Specifically, we study uniformly random valid \(q\)-colorings of an input graph \(G\) with maximum degree \(\Delta\). This is in contrast to prior work in the area, where the relevant random objects are defined as a distribution with \(\mathcal O(1)\) parameters (for example, \(n\) and \(p\) in the \(G(n,p)\) model). The distribution over valid colorings is instead specified via a “huge” input (the underlying graph \(G\)), that is far too large to be read by a sub-linear time algorithm. Instead, our implementation accesses \(G\) through local neighborhood probes, and is able to answer queries to the color of any given vertex in sub-linear time for \(q\ge 9\Delta\), in a manner that is consistent with a specific random valid coloring of \(G\). Furthermore, the implementation is memory-less, and can maintain consistency with non-communicating copies of itself.

Sublinear-time algorithms for counting star subgraphs via edge sampling [pdf]

Maryam Aliakbarpour, Amartya Shankha Biswas, Themis Gouleakis, John Peebles, Ronitt Rubinfeld, Anak Yodpinyanee

Algorithmica 2018

We study the problem of estimating the value of sums of the form \(S_p = \sum \binom{x_i}{p}\) when one has the ability to sample \(x_i\ge 0\) with probability proportional to its magnitude. When \(p=2\), this problem is equivalent to estimating the selectivity of a self-join query in database systems when one can sample rows randomly. We also study the special case when \(\{x_i\}\) is the degree sequence of a graph, which corresponds to counting the number of \(p\)-stars in a graph when one has the ability to sample edges randomly. Our algorithm for a \((1\pm \epsilon)\)-multiplicative approximation of \(S_p\) has query and time complexities \(O\left(\frac{m\log\log n}{\epsilon^2 S_p^{1/p}}\right)\). Here, \(m=\frac{\sum x_i}{2}\) is the number of edges in the graph, or equivalently, half the number of records in the database table. Similarly, \(n\) is the number of vertices in the graph and the number of unique values in the database table. We also provide tight lower bounds (up to polylogarithmic factors) in almost all cases, even when \(\{xi\}\) is a degree sequence and one is allowed to use the structure of the graph to try to get a better estimate. We are not aware of any prior lower bounds on the problem of join selectivity estimation. For the graph problem, prior work which assumed the ability to sample only vertices uniformly gave algorithms with matching lower bounds (Gonen et al. in SIAM J Comput 25:1365–1411, 2011). With the ability to sample edges randomly, we show that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work. For example, in the regime where \(S_p\le n\), and \(p=2\), our upper bound is \(\mathcal{\tilde{O}}(n/S_p^{1/2})\), in contrast to their \(\Omega(n/S_p^{1/3})\) lower bound when no random edge queries are available. In addition, we consider the problem of counting the number of directed paths of length two when the graph is directed. This problem is equivalent to estimating the selectivity of a join query between two distinct tables. We prove that the general version of this problem cannot be solved in sublinear time. However, when the ratio between in-degree and out-degree is bounded—or equivalently, when the ratio between the number of occurrences of values in the two columns being joined is bounded—we give a sublinear time algorithm via a reduction to the undirected case.

Common Development of Prisms, Anti-Prisms, Tetrahedra, and Wedges [pdf]

Amartya Shankha Biswas, Erik D Demaine

Canadian Conference on Computational Geometry (CCCG) 2017

We construct an uncountably infinite family of flat unfoldings, each of which can be folded into twelve distinct convex solids, while also tiling the plane.

Efficient Origami Construction of Orthogonal Terrains using Cross Section Evolution [pdf]

Amartya Shankha Biswas, Erik D Demaine, Jason S Ku

International Meeting on Origami in Science, Mathematics and Education (7OSME) 2017

We introduce a new method of origami construction, using cross section diagrams. Instead of beginning our construction from a 2-dimensional sheet of paper, we consider a 1-dimensional cross section moving forwards in time. We obtain conditions for the validity of a particular cross section evolution sequence, and prove that the resulting folded state is isometric to a flat sheet of paper. Subsequently, we use this machinery to design an efficient construction of orthogonal terrains, with arbitrary rational extrusion heights.