Tai Tran

Keep in mind: sequence read is much faster than random read

Because it minimizes the number of seek operations (moving the read/write head to a specific location on the storage device) and maximizes the amount of data transferred per operation

What is index ?

Indexes are entry points for tables
Index used to locate the tuples in the table
Index stored separately from table

Create Index :

create index idx_name on accounts(username);
SELECT relfilenode FROM pg_class WHERE relname LIKE 'idx_name';

Get physics file name of index:

index location Find index file location: find index location

Index types?

B-Tree Index:

A B-Tree index is composed of multiple levels of pages, where each page can store a number of index entries. The index entries are sorted by the values of the indexed column or columns, and each entry also contains a pointer to the location of the corresponding row in the table.

Supported Operators :

- Less than <
- Less than equal to <=
- Equal =
- Greater than equal to >=
- Greater than >

Advantages :

A B-Tree index can handle various types of queries, such as equality, range, pattern matching, null checks, and sorting.
A B-Tree index is efficient and balanced, meaning that it does not waste space or become skewed.
A B-Tree index can support unique constraints and primary keys, by ensuring that no two index entries have the same value.

Limitations :

A B-Tree index cannot store values that are too large or complex, such as arrays or JSON objects.
A B-Tree index cannot support queries that involve operators other than those defined by its operator class.
A B-Tree index may require maintenance and reorganization to keep its performance optimal.

Hash Index

A Hash index is a type of index that PostgreSQL uses to store and retrieve data that can only be compared for equality. A Hash index stores a 32-bit hash code derived from the value of the indexed column, and a pointer to the location of the corresponding row in the table.

Supported Operators :

A Hash index can only handle queries that use the equal operator: =

Advantages:

A Hash index is smaller and faster than a B-Tree index for equality queries, because it does not need to store or compare the actual values of the indexed column.
A Hash index can handle any data type that has a hash function defined, even if the data type does not have a natural ordering.
A Hash index can support unique constraints and primary keys, by ensuring that no two index entries have the same hash code.

Limitations:

A Hash index cannot support queries that involve operators other than equality, such as range, pattern matching, or sorting.
A Hash index cannot store values that are too large or complex to be hashed, such as arrays or JSON objects.
A Hash index may require maintenance and reorganization to keep its performance optimal, especially if the data distribution changes over time.

BRIN Index

BRIN stands for Block Range Index, which means that the index stores summary information about a group of pages that are adjacent to each other in the table. For example, a table that has a column for the date of an order might have the rows sorted by date, so that the earlier orders are located earlier in the table. A BRIN index can store the minimum and maximum date values for each block range, and use them to quickly find the rows that match a given date or range of dates..

Supported Operators:

Range selection

Advantages:

A BRIN index is very small and fast, because it does not need to store or compare the actual values of the indexed column, only the summary information for each block range.
A BRIN index can handle any data type that has a linear sort order, even if the data type does not have a natural ordering.
A BRIN index can support unique constraints and primary keys, by ensuring that no two block ranges have the same summary values.

Limitations:

A BRIN index cannot support queries that involve operators other than equality or range, such as pattern matching or sorting.
A BRIN index cannot store values that are too large or complex to be summarized, such as arrays or JSON objects. A BRIN index may require maintenance and reorganization to keep its performance optimal, especially if the data distribution changes over time.

GIN Index

A GIN index is a type of index that PostgreSQL uses to store and retrieve data that can be decomposed into multiple components, such as arrays, JSON objects, or full text documents. GIN stands for Generalized Inverted Index, which means that the index stores a mapping from each component value (also called a key) to a list of locations (also called postings) where the value appears in the table.

Supported Operators:

-A GIN index can handle queries that involve operators that test for the presence or absence of a key in a column value, such as @>, <@, =, and &&. For example, if you have a GIN index on a column of type tsvector, which represents a full text document, you can use the @@ operator to search for documents that match a given query of type tsquery. The index will quickly find the documents that contain all the terms in the query, and then filter out any false positives by checking the actual column values.

Advantages:

A GIN index is very fast and accurate for queries that involve multiple keys, because it can use the intersection or union of the postings lists to find the matching rows.
A GIN index can support any data type that has a GIN operator class defined, which allows for extensibility and customization.
A GIN index can support unique constraints and primary keys, by ensuring that no two rows have the same set of keys.

Limitations:

A GIN index is larger and slower to update than other types of indexes, because it needs to store and maintain a lot of information for each key.
A GIN index cannot support queries that involve operators other than those defined by its operator class, such as range, pattern matching, or sorting.
A GIN index may require maintenance and reorganization to keep its performance optimal, especially if the data distribution changes over time.

GiST Index

A GiST index is a type of index that PostgreSQL uses to store and retrieve data that can be represented as a geometric shape, such as points, lines, polygons, or circles. GiST stands for Generalized Search Tree, which means that the index can support any data type that has a consistent way of defining the relationship between two values, such as overlap, contain, or distance.

Supported Operators:

- A GiST index can handle queries that involve operators that test for these relationships, such as &&, @>, <@, and <->.

Advantages:

A GiST index is very flexible and extensible, because it can support any data type that has a GiST operator class defined, which allows for customization and innovation.
A GiST index is very efficient and scalable, because it uses a balanced tree structure that adapts to the data distribution and minimizes the disk access.
A GiST index can support unique constraints and primary keys, by ensuring that no two values have the same geometric shape.

Limitations:

A GiST index is lossy, meaning that the index may produce false matches, and it is necessary to check the actual table row to eliminate such false matches. (PostgreSQL does this automatically when needed.) GiST indexes are lossy because each value is represented in the index by a bounding box that approximates its shape.
A GiST index cannot support queries that involve operators other than those defined by its operator class, such as equality, range, pattern matching, or sorting.
A GiST index may require maintenance and reorganization to keep its performance optimal, especially if the data distribution changes over time. time.

-A GIN index can handle queries that involve operators that test for the presence or absence of a key in a column value, such as @>, <@, =, and &&. For example, if you have a GIN index on a column of type tsvector, which represents a full text document, you can use the @@ operator to search for documents that match a given query of type tsquery. The index will quickly find the documents that contain all the terms in the query, and then filter out any false positives by checking the actual column values.

Serverless Blog

Deep dive Postgres Index

What is index ?

Index types?

Deep dive Postgres Index

What is index ?

Index types?