SQL (database)

PostgreSQL | Big Query |

 

General Syntax


SELECT [DISTINCT] 
FROM 
[WHERE ]
[GROUP BY ]
[HAVING ]
[ORDER BY ]
[LIMIT ]

 

In Memory SQL DB - DuckDB

DuckDB is an open-source SQL database designed for analytical workloads.   It excels at data-intensive applications such as data science and machine learning. It is optimized for in-memory performance and designed for parallel processing, allowing for the use of multiple cores for increased performance.   DuckDB also supports advanced analytics features such as window functions and recursive queries. 

 

Database Markdown Language — DBML

DBML is a markdown language for documenting data models.  

 

Installing SQL Servers

MySQL Community Downloads

 

Tutorials

How to Create and Manipulate SQL Databases with Python

 

PostgreSQL

PostgreSQL, also known as Postgres, is a free and open-source relational database management system emphasizing extensibility and SQL compliance.   Database table contents and sizes are substantial, as shown in the limits page.   Installations are available for Windows OS (install tutorial), Linux, macOS, and others.   Online tutorials are available, as well as online documentation.  

Windows Installation

Follow the Windows OS install tutorial.   It will instruct you to download a installer for Windows OS.   Choose the latest version and then click the download icon for Windows x86-64.  

Using PostgreSQL with Python

PostgreSQL like most SQL databases are organized such that they contain default databases for administration ('postgres', 'template0', or 'template1' in the case of PostgreSQL), and then you create a new database for your project.   Your project database will then have one or more tables.   Each table has columns (fields) and rows containing data (by field/column and row).   A administrator level user named 'postgre' is created by default.   You should create other database users with appropriate database access permissions.  

IMPORTANT: Do not modify or add tables to the default 'postgres', 'template0', or 'template1' databases.  

 

Create a new database named 'pytestdb' for your project from psql:


# create database pytestdb;
# \list

Make sure you include the semicolon ";" and the end of the SQL command 'create database'.  

Begin with this beginner Python to PostgreSQL

" target="_blank">connection tutorial.   Keep the password you created for the 'postgres' user in an eWallet.  

Stack Builder will run after the installation.   Choose the installation from the dropdown list ('PostgreSQL ##(x64) on port 5432').   A list of applications will download and then be displayed.   You don't need any additional applications, so just click 'Cancel' and continue to click through to exit.  

A new Windows menu folder named 'PostgreSQL' will be created from the installation that will include the 'pgAdmin #' app and a 'SQL Shell (psql)' app.   Running the 'psql' app as described in the installation tutorial and clicking through the initial prompts will show that a default database named 'postgres' is created on port 5432.   Close the command prompt.  

Create a Python virtual environment and then use pip to install the package named 'psycopg2' also called psycopg.  


py -m pip install psycopg2
py -m pip install requests

Create the following script named 'db.py' in the Python virtual environment folder 'PostgreSQL' to connect to the PostgreSQL database, updating 'password=' with the pasword for user 'postgres':


#   Written by:  Mark W Kiehl
#   http://mechatronicsolutionsllc.com/
#   http://www.savvysolutions.info/savvycodesolutions/
#
#   Demonstrate using psycopg to create a table and then populate it with
#   data using the psycopg function psycopg2.extras.execute_batch.
#
#   Portions inspired and derived from the work of Haki Benita https://hakibenita.medium.com/fastest-way-to-load-data-into-postgresql-using-python-d2e6de8b2aaa
#
#   https://www.postgresqltutorial.com/postgresql-python/
#   https://www.psycopg.org/docs/index.html

# pip install psycopg2
import psycopg2
import psycopg2.extras  # for execute_batch

conn = psycopg2.connect(
    host="localhost",
    database="pytestdb",
    user="postgres",
    password="[your db password]",
)
conn.autocommit = True  #set autocommit=True so every command we execute will take effect immediately


def create_db_table(cur):
    """ Connect to the local PostgreSQL database server """
    try:
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
        
        # create a cursor
        cur = conn.cursor()
        
        # execute a SQL statement to get the PostgreSQL version
        cur.execute('SELECT version()')
        # display the PostgreSQL database server version
        db_version = cur.fetchone()
        print(db_version)
        
        # Create a new table 'test_a'
        cur.execute("""
            DROP TABLE IF EXISTS test_a;
            CREATE UNLOGGED TABLE test_a (
                id                  INTEGER,
                description         TEXT,
                first_brewed        DATE,
                num_float           DECIMAL,
                num_int             INTEGER
            );
        """)

    except (Exception, psycopg2.DatabaseError) as error:
        print(error)
    finally:
            print('Created db table staging_bears and closed the db connection')


def add_data_to_table(cur, list_data):
    # Psycopg2 provides a way to insert many rows at once using .execute_batch (FYI much faster than .executebatch). 
    #print('type(list_data) {}'.format(type(list_data)))
    print('Inserting {} records into the db table test_a'.format(len(list_data)))    
    
    try:
        # connect to the PostgreSQL server
        print('Connecting to the PostgreSQL database...')
                
        psycopg2.extras.execute_batch(cur, """
            INSERT INTO test_a VALUES (
                %(id)s,
                %(description)s,
                %(first_brewed)s,
                %(num_float)s,
                %(num_int)s
            );
        """, list_data)    
        
        # execute a SQL statement to get the number of records in table test_a
        cur.execute('select count(*) from test_a;')
        rows = cur.fetchone()
        print('{} rows added to table test_a'.format(rows))
        
    except (Exception, psycopg2.DatabaseError) as error:
        print('SQL ERROR: {}'.format(error))
    finally:
        print('add_data_to_table() done')    


from datetime import datetime


if __name__ == '__main__':
    # Create table test_a
    with conn.cursor() as cur:
        create_db_table(cur)
        # create some data as a list of dictionaries.
        # NOTE:  The key of the dictionary must match the table field (column) names.
        dic = { 'id': 1,
                'description': 'my first description',
                'first_brewed': datetime.now(),
                'num_float':    1.2345,
                'num_int':      32767
        }
        li = []
        li.append(dic)
        dic = { 'id': 2,
                'description': 'my second description',
                'first_brewed': datetime.now(),
                'num_float':    5432.1,
                'num_int':      -32767
        }
        li.append(dic)
        print(li)
        add_data_to_table(cur, li)
        if conn is not None:
            conn.close()

 

Big Query

BigQuery is Google's fully managed, serverless data warehouse that enables scalable analysis over petabytes of data.   It is a Platform as a Service (PaaS) that supports querying using ANSI SQL.   It also has built-in machine learning capabilities.   WBigQuery stores data in a columnar format .  

* Automate Your BigQuery Schema Definitions With 5 Lines of Python

Marie Truong published a tutorial on Medium that demonstrates using BigQuery with Python to load two tables from API endpoints and then join the two tables.  

BigQuery 101: All the Basics You Need to Know

A Guide to Arrays and Structs in BigQuery

BigQuery API Client Libraries

A Crash Course in Google BigQuery