2019
Support Vector Machines
These are my handwritten notes that explain the basics of Support Vector Machines (SVM), which are supervised learning models with associated learning algorithms that analyze data for classification and regression analysis.
2016
Approximate Nearest Neighbors
This is a summary of the paper Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality by Indyk and Motwani.
Given a set of n points P in X = Rd, the nearest-neighbor (NN) problem is to find a point p in P that is closest to a query point q.
This problem is of importance to many applications involving similarity searching such as data compression, databases and data mining, information retrieval, image and video databases, machine learning, pattern recognition, and statistics and data analysis.
Suffering from the curse of dimensionality, existing algorithms either have query time linear in n and d, or have query time sub-linear in n and polynomial in d but exponential preprocessing cost nd, making them infeasible for a large d (e.g., d ≥ 100) which is common in practice.
It is observed that insisting on the absolute nearest neighbor is often an overkill, due to the heuristical nature of the use of distance metric and the selection of features in practice.
Therefore the paper in discussion aims to solve an approximate relaxation to nearest neighbor problem, namely the ϵ-nearest neighbor problem (ϵ-NN): to find a point p ∈ P that is an ϵ-approximate nearest neighbor of the query q, in that ∀p′ ∈ P, d(p, q) ≤ (1+ϵ)d(p′, q).
2014
Understanding Basic Concepts in Radiometry
Radiometry is a theoretical foundation for physically-based rendering algorithms.
It is a measurement of electromagnetic energy (of visible light in particular), and measures spatial (and angular) properties of light.
Use ray optics, hence largely a geometrical subject.
These slides:
- Provide an intuitive understanding of flux, irradiance, radiance
- Provide an intuitive understanding of reflection functions (BRDF)
- Point out common pitfalls of thinking, or sources of confusion along the way
- Short digressions into somewhat related topics
2011
A Type and Effect System for Atomicity
In this report, we summarize and analyze the PLDI 2003 paper A Type and Effect System for Atomicity by Flanagan and Qadeer.
We show that while this paper is the first to use static analysis to successfully verify a fundamental correctness property in multithreaded programs that is atomicity, its approach has limitations that should be, and indeed have been, addressed in future work.
There have been strong beliefs in recent years that parallel computing is the future for computing. As a consequence, multicore machines, and multithreaded software that can take advantage of such machines, have becoming increasingly more popular. Compared to serial programs, multithreaded programs are harder to test and debug. The main reason is that the interleaving of threads results in many complex execution scenarios and makes synchronization bugs non-deterministic, thus hard to reproduce and fix. One common source of bugs in parallel programs is race condition, where more than one threads try to access a shared variable simultaneously, and at least one of the accesses is a write. There have been works trying to detect race conditions using a variety of techniques. However, Flanagan and Qadeer argue that the absence of race conditions is neither necessary nor sufficient for a program to be correct. They advocate checking for a stronger non-interference property called atomicity, where atomic methods can be assumed to execute in “one step”, without interferences of methods from other threads. To that end, the authors propose a type system to specify and verify the atomicity of methods in multithreaded Java programs. They show that their approach successfully finds atomicity violations in the JDK’s codebase that can lead to crashes. This report elaborates more on the importance of atomicity, and discusses race conditions and atomicity in a larger context. We also give reasons to justify the use of a type system instead of other static analysis techniques. In the same spirit, we states the advantages and disadvantages of the static analysis approach taken by this paper compared with other methods such as dynamic analysis and model checking used in later papers. Finally, we explain in detail and give examples to illustrate the limitations of the proposed type system when verifying atomicity in practice.
There have been strong beliefs in recent years that parallel computing is the future for computing. As a consequence, multicore machines, and multithreaded software that can take advantage of such machines, have becoming increasingly more popular. Compared to serial programs, multithreaded programs are harder to test and debug. The main reason is that the interleaving of threads results in many complex execution scenarios and makes synchronization bugs non-deterministic, thus hard to reproduce and fix. One common source of bugs in parallel programs is race condition, where more than one threads try to access a shared variable simultaneously, and at least one of the accesses is a write. There have been works trying to detect race conditions using a variety of techniques. However, Flanagan and Qadeer argue that the absence of race conditions is neither necessary nor sufficient for a program to be correct. They advocate checking for a stronger non-interference property called atomicity, where atomic methods can be assumed to execute in “one step”, without interferences of methods from other threads. To that end, the authors propose a type system to specify and verify the atomicity of methods in multithreaded Java programs. They show that their approach successfully finds atomicity violations in the JDK’s codebase that can lead to crashes. This report elaborates more on the importance of atomicity, and discusses race conditions and atomicity in a larger context. We also give reasons to justify the use of a type system instead of other static analysis techniques. In the same spirit, we states the advantages and disadvantages of the static analysis approach taken by this paper compared with other methods such as dynamic analysis and model checking used in later papers. Finally, we explain in detail and give examples to illustrate the limitations of the proposed type system when verifying atomicity in practice.
2010
A Survey of Three DHT Systems:
CAN, Pastry, and Chord
CAN, Pastry, and Chord
One of the key problems in P2P is efficient lookup.
Given a key, we want to locate the nodes that store the data associated with the key.
A system that provides that hash-table-like service is called a distributed hash table (DHT).
In this report, we study three early, notable DHT systems out of the many that have been proposed in recent years, namely CAN [1], Pastry [2], and Chord [3].
We look at how each of them solve the basic lookup problem, compare their performances and scalabilities, and assess them with regards to important criteria such as load balancing, data availability, and robustness.
Along the way, we also comment on their shortcomings and propose possible research directions to overcome those drawbacks.