Best Sql Cheat Sheet

Posted : admin On 1/3/2022
  1. T Sql Cheat Sheet Pdf
  2. Best Sql Cheat Sheet

SQL CHEAT SHEET CREATED BY Tomi Mester from Data36.com Tomi Mester is a data analyst and researcher. He worked for Prezi, iZettle and several smaller companies as an analyst/consultant. He’s the author of the Data36 blog where he writes posts and tutorials on a weekly basis about data science, AB- testing, online research and data coding. SQL For Dummies Cheat Sheet By Allen G. Taylor This Cheat Sheet consists of several helpful tables and lists, containing information that comes up repeatedly when working with SQL. In one place, you can get a quick answer to a number of different questions that frequently arise during an SQL development effort.

-->

This cheat sheet provides helpful tips and best practices for building dedicated SQL pool (formerly SQL DW) solutions.

The following graphic shows the process of designing a data warehouse with dedicated SQL pool (formerly SQL DW):

Queries and operations across tables

When you know in advance the primary operations and queries to be run in your data warehouse, you can prioritize your data warehouse architecture for those operations. These queries and operations might include:

  • Joining one or two fact tables with dimension tables, filtering the combined table, and then appending the results into a data mart.
  • Making large or small updates into your fact sales.
  • Appending only data to your tables.

Knowing the types of operations in advance helps you optimize the design of your tables.

Data migration

First, load your data into Azure Data Lake Storage or Azure Blob Storage. Next, use the COPY statement to load your data into staging tables. Use the following configuration:

DesignRecommendation
DistributionRound Robin
IndexingHeap
PartitioningNone
Resource Classlargerc or xlargerc

Learn more about data migration, data loading, and the Extract, Load, and Transform (ELT) process.

Distributed or replicated tables

Use the following strategies, depending on the table properties:

TypeGreat fit for...Watch out if...
Replicated* Small dimension tables in a star schema with less than 2 GB of storage after compression (~5x compression)* Many write transactions are on table (such as insert, upsert, delete, update)
* You change Data Warehouse Units (DWU) provisioning frequently
* You only use 2-3 columns but your table has many columns
* You index a replicated table
Round Robin (default)* Temporary/staging table
* No obvious joining key or good candidate column
* Performance is slow due to data movement
Hash* Fact tables
* Large dimension tables
* The distribution key cannot be updated

Tips:

  • Start with Round Robin, but aspire to a hash distribution strategy to take advantage of a massively parallel architecture.
  • Make sure that common hash keys have the same data format.
  • Don't distribute on varchar format.
  • Dimension tables with a common hash key to a fact table with frequent join operations can be hash distributed.
  • Use sys.dm_pdw_nodes_db_partition_stats to analyze any skewness in the data.
  • Use sys.dm_pdw_request_steps to analyze data movements behind queries, monitor the time broadcast, and shuffle operations take. This is helpful to review your distribution strategy.

Learn more about replicated tables and distributed tables.

Index your table

Indexing is helpful for reading tables quickly. There is a unique set of technologies that you can use based on your needs:

TypeGreat fit for...Watch out if...
Heap* Staging/temporary table
* Small tables with small lookups
* Any lookup scans the full table
Clustered index* Tables with up to 100 million rows
* Large tables (more than 100 million rows) with only 1-2 columns heavily used
* Used on a replicated table
* You have complex queries involving multiple join and Group By operations
* You make updates on the indexed columns: it takes memory
Clustered columnstore index (CCI) (default)* Large tables (more than 100 million rows)* Used on a replicated table
* You make massive update operations on your table
* You overpartition your table: row groups do not span across different distribution nodes and partitions

Tips:

T Sql Cheat Sheet Pdf

  • On top of a clustered index, you might want to add a nonclustered index to a column heavily used for filtering.
  • Be careful how you manage the memory on a table with CCI. When you load data, you want the user (or the query) to benefit from a large resource class. Make sure to avoid trimming and creating many small compressed row groups.
  • On Gen2, CCI tables are cached locally on the compute nodes to maximize performance.
  • For CCI, slow performance can happen due to poor compression of your row groups. If this occurs, rebuild or reorganize your CCI. You want at least 100,000 rows per compressed row groups. The ideal is 1 million rows in a row group.
  • Based on the incremental load frequency and size, you want to automate when you reorganize or rebuild your indexes. Spring cleaning is always helpful.
  • Be strategic when you want to trim a row group. How large are the open row groups? How much data do you expect to load in the coming days?
Best Sql Cheat SheetSql basics cheat sheet

Learn more about indexes.

Partitioning

You might partition your table when you have a large fact table (greater than 1 billion rows). In 99 percent of cases, the partition key should be based on date. Be careful to not overpartition, especially when you have a clustered columnstore index.

With staging tables that require ELT, you can benefit from partitioning. It facilitates data lifecycle management.Be careful not to overpartition your data, especially on a clustered columnstore index.

Learn more about partitions.

Cheat

Incremental load

If you're going to incrementally load your data, first make sure that you allocate larger resource classes to loading your data. This is particularly important when loading into tables with clustered columnstore indexes. See resource classes for further details.

We recommend using PolyBase and ADF V2 for automating your ELT pipelines into your data warehouse.

For a large batch of updates in your historical data, consider using a CTAS to write the data you want to keep in a table rather than using INSERT, UPDATE, and DELETE.

Maintain statistics

It's important to update statistics as significant changes happen to your data. See update statistics to determine if significant changes have occurred. Updated statistics optimize your query plans. If you find that it takes too long to maintain all of your statistics, be more selective about which columns have statistics.

You can also define the frequency of the updates. For example, you might want to update date columns, where new values might be added, on a daily basis. You gain the most benefit by having statistics on columns involved in joins, columns used in the WHERE clause, and columns found in GROUP BY.

Learn more about statistics.

Resource class

Resource groups are used as a way to allocate memory to queries. If you need more memory to improve query or loading speed, you should allocate higher resource classes. On the flip side, using larger resource classes impacts concurrency. You want to take that into consideration before moving all of your users to a large resource class.

If you notice that queries take too long, check that your users do not run in large resource classes. Large resource classes consume many concurrency slots. They can cause other queries to queue up.

Finally, by using Gen2 of dedicated SQL pool (formerly SQL DW), each resource class gets 2.5 times more memory than Gen1.

Learn more how to work with resource classes and concurrency.

Lower your cost

A key feature of Azure Synapse is the ability to manage compute resources. You can pause your dedicated SQL pool (formerly SQL DW) when you're not using it, which stops the billing of compute resources. You can scale resources to meet your performance demands. To pause, use the Azure portal or PowerShell. To scale, use the Azure portal, PowerShell, T-SQL, or a REST API.

Autoscale now at the time you want with Azure Functions:

Optimize your architecture for performance

We recommend considering SQL Database and Azure Analysis Services in a hub-and-spoke architecture. This solution can provide workload isolation between different user groups while also using advanced security features from SQL Database and Azure Analysis Services. This is also a way to provide limitless concurrency to your users.

Learn more about typical architectures that take advantage of dedicated SQL pool (formerly SQL DW) in Azure Synapse Analytics.

Deploy in one click your spokes in SQL databases from dedicated SQL pool (formerly SQL DW):

Download this 2-page SQL Basics Cheat Sheet in PDF or PNG format, print it out, and stick to your desk.

The SQL Basics Cheat Sheet provides you with the syntax of all basics clauses, shows you how to write different conditions, and has examples. You can download this cheat sheet as follows:

You may also read the contents here:

SQL Basics Cheat Sheet

SQL

SQL, or Structured Query Language, is a language to talk to databases. It allows you to select specific data and to build complex reports. Today, SQL is a universal language of data. It is used in practically all technologies that process data.

SAMPLE DATA

QUERYING SINGLE TABLE

Fetch all columns from the country table:

Fetch id and name columns from the city table:

Fetch city names sorted by the rating column in the default ASCending order:

Fetch city names sorted by the rating column in the DESCending order:

Aliases

Columns

Tables

FILTERING THE OUTPUT

COMPARISON OPERATORS

Fetch names of cities that have a rating above 3:Fetch names of cities that are neither Berlin nor Madrid:

TEXT OPERATORS

Fetch names of cities that start with a 'P' or end with an 's':Fetch names of cities that start with any letter followed by'ublin' (like Dublin in Ireland or Lublin in Poland):

OTHER OPERATORS

Fetch names of cities that have a population between 500K and 5M:Fetch names of cities that don't miss a rating value:Fetch names of cities that are in countries with IDs 1, 4, 7, or 8:

QUERYING MULTIPLE TABLES

INNER JOIN

JOIN (or explicitly INNER JOIN) returns rows that have matching values in both tables.

LEFT JOIN

LEFT JOIN returns all rows from the left table with corresponding rows from the right table. If there's no matching row, NULLs are returned as values from the second table.

Sql basics cheat sheet

RIGHT JOIN

RIGHT JOIN returns all rows from the right table with corresponding rows from the left table. If there's no matching row, NULLs are returned as values from the left table.

FULL JOIN

FULL JOIN (or explicitly FULL OUTER JOIN) returns all rows from both tables – if there's no matching row in the second table, NULLs are returned.

CROSS JOIN

CROSS JOIN returns all possible combinations of rows from both tables. There are two syntaxes available.

NATURAL JOIN

NATURAL JOIN will join tables by all columns with the same name.

NATURAL JOIN used these columns to match rows:
city.id, city.name, country.id, country.name.
NATURAL JOIN is very rarely used in practice.

AGGREGATION AND GROUPING

GROUP BYgroups together rows that have the same values in specified columns. It computes summaries (aggregates) for each unique combination of values.

AGGREGATE FUNCTIONS

  • avg(expr) − average value for rows within the group
  • count(expr) − count of values for rows within the group
  • max(expr) − maximum value within the group
  • min(expr) − minimum value within the group
  • sum(expr) − sum of values within the group

EXAMPLE QUERIES

Find out the number of cities:

Find out the number of cities with non-null ratings:

Find out the number of distinctive country values:

Find out the smallest and the greatest country populations:

Find out the total population of cities in respective countries:

Find out the average rating for cities in respective countries if the average is above 3.0:

SUBQUERIES

A subquery is a query that is nested inside another query, or inside another subquery. There are different types of subqueries.

Best Sql Cheat Sheet

SINGLE VALUE

The simplest subquery returns exactly one column and exactly one row. It can be used with comparison operators =, <, <=, >, or >=.

This query finds cities with the same rating as Paris:

MULTIPLE VALUES

A subquery can also return multiple columns or multiple rows. Such subqueries can be used with operators IN, EXISTS, ALL, or ANY.

This query finds cities in countries that have a population above 20M:

CORRELATED

A correlated subquery refers to the tables introduced in the outer query. A correlated subquery depends on the outer query. It cannot be run independently from the outer query.

This query finds cities with a population greater than the average population in the country:

This query finds countries that have at least one city:

SET OPERATIONS

Set operations are used to combine the results of two or more queries into a single result. The combined queries must return the same number of columns and compatible data types. The names of the corresponding columns can be different

UNION

UNION combines the results of two result sets and removes duplicates. UNION ALL doesn't remove duplicate rows.

This query displays German cyclists together with German skaters:

INTERSECT

INTERSECT returns only rows that appear in both result sets.

Cheat

This query displays German cyclists who are also German skaters at the same time:

EXCEPT

EXCEPT returns only the rows that appear in the first result set but do not appear in the second result set.

This query displays German cyclists unless they are also German skaters at the same time:

Try out the interactive SQL Basics course at LearnSQL.com, and check out our other SQL courses.

You may also like