Terminology used in EXPLAIN PLAN in Amazon Redshift is briefly explained in this post.
- To get the explain plan of a query, include EXPLAIN in front of any query.
- Collecting statistics of the tables by analyzing them is important to get correct estimates in explain plan.
- Terminology used in EXPLAIN:
- STEP: Every individual operation is a step in explain plan.
- SEGMENT: Segments are the number of steps that can be done by a single process.
- STREAM: A collection of segments that always begin with scan or reading of data and ends with materialization or blocking up.
- LAST SEGMENT: The term last segment means the query returns the data. If the return set is aggregated or sorted, the intermediate data is sent to leader node from all compute nodes. Leader node collects the data and sends back to the requesting client.
- SEQUENTIAL SCAN: Also termed as scan. Data is read sequentially from beginning to end.
- MERGE JOIN: Also termed as mjoin. This is the fastest Redshift join. This is used for inner joins and outer joins that are both distributed and sorted on join keys.
- HASH JOIN: Also termed as hjoin. This is based on hashing the joining columns. Its faster than nested loop join.
- NESTED LOOP JOIN: Also termed as nloop. Its used for cross joins, joins with inequality conditions. Its the slowest join of all.
- AGGREGATE: Also termed as aggr. This keyword is used for scalar aggregation functions. Scalar agg. functions returns only one row and one column.
- HASHAGGREGATE: Also termed as aggr. This is used for unsorted grouped aggregate functions.
- GROUPAGGREGATE: Also termed as aggr.
- SORT: Also termed as sort. ORDER BY controls this sort.
- MERGE: Also termed as merge. This produces the final results based on intermediate sorted results derived from parallel operations.
- SetOp Except: Also termed as hjoin. This is only used for EXCEPT queries.
- HASH Intersect: Also termed as hjoin. This is used for INTERSECT queries.
- Append: Also termed as save. This is the append used with subquery scan to implement UNION and UNION ALL queries.
- LIMIT: Also termed as limit. This is used with LIMIT clause.
- MATERIALIZE: Also termed as save.
- UNIQUE: Also termed as unique. Used mostly when DISTINCT keyword is used.
- WINDOW: Also termed as window. This term means to compute window functions.
- Network (Broadcast): Also termed as bcast. This is a Broadcast that is considered an attribute of the Join Explain operators and steps.
- DS_BCAST_INNER: In joins, it means we are broadcasting the entire inner table to all the compute nodes.
- Network (Distribute) : Also termed as dist. This is used to distribute rows to compute nodes for parallel processing.
- DS_DIST_NONE: No Data moves. Since joining tables have same distkey as join keys, data resides on the same node.
- Network(Send to Leader) - Also termed as return. Sends the results to Leader node for furthur processing.
- INSERT(Using Results) - Also termed as insert. Inserts data.
- DELETE(Scan and Filter) - Also termed as delete. Deletes data.
- UPDATE(Scan and Filter) - Also termed as delete,insert. Update is actually both delete and then insert.
- COST 0.00 ..0.09: Costs are cumulative as we read up the plan. 0.00 is the cost to read first row and 0.09 is the cost to read all rows.
- ROWS: The expected number of rows to return.
- WIDTH: The estimated average row size (width) of all the rows, in bytes.
Visit my post on "Key highlights on SQL Functions, Commands, Aggregate Functions, OLAP functions, Date/Time functions, Queries" here.
This concept is a good way to enhance the knowledge.thanks for sharing..
ReplyDeleteRedShift Training
AWS RedShift Training
Amazon RedShift Online Training
Thank you for excellent article, Please refer below if you are looking for best training institute in hyderabad.
ReplyDeleteRedShift Training
AWS RedShift Training
Amazon RedShift Online Training
Thank you for excellent article, Please refer below if you are looking for best training institute in hyderabad.
ReplyDeleteRedShift Training
AWS RedShift Training
Amazon RedShift Online Training