{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Database Queries"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{contents} Table of Contents\n",
    ":depth: 4\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction: The History of the Structured Query Language (SQL)\n",
    "\n",
    "<div style= \"float:right;position: relative; padding: 10px\">\n",
    "<a href=\"https://xkcd.com/1409/\"><img src=\"https://imgs.xkcd.com/comics/query.png\" width=\"400\"></a>\n",
    "</div>\n",
    "\n",
    "The structured query language (SQL) was invented by Donald D. Chamberlin and Raymond F. Boyce in 1974. Chamberlain and Boyce were both young computer scientists working at the IBM T.J. Watson Research Center in Yorktown Heights, New York, and they met E. F. Codd at a research symposium that Codd organized there. Codd, four years prior, had published the [seminal article that defined the relational model](https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf) for databases. Codd's relational model is defined using relational algebra and relational calculus, two notational standards that Codd himself created to elaborate on set theory as applied specifically to data tables. One important property of set theory is that highly abstract mathematical expressions can be expressed in plain language. For example, consider the set $A$ of holidays in the United States during which banks are closed:\n",
    "\n",
    "$$ A = \\{\\text{New Year's Day}, \\text{Birthday of Martin Luther King, Jr.}, \\\\  \\text{Washington’s Birthday}, \\text{Memorial Day}, \\text{Independence Day}, \\\\  \\text{Labor Day}, \\text{Columbus Day}, \\text{Veterans Day},  \\\\ \\text{Thanksgiving Day}, \\text{Christmas}\\}$$\n",
    "\n",
    "Also consider the set $B$ of holidays in the United Kingdom during which banks are closed:\n",
    "\n",
    "$$ B = \\{\\text{New Year's Day}, \\text{Good Friday}, \\text{Easter Monday}, \\\\ \\text{Early May bank holiday}, \\text{Spring bank holiday}, \\\\\\text{Summer bank holiday}, \\text{Christmas}, \\text{Boxing Day}\\}$$\n",
    "\n",
    "The intersection between sets $A$ and $B$ is a set that consists of all elements that exist with both set $A$ and set $B$:\n",
    "\n",
    "$$ A \\cap B = \\{\\text{New Year's Day}, \\text{Christmas}\\}$$\n",
    "\n",
    "The notation $A\\cap B$ is a mathematical abstraction of an idea that can be expressed in plain-spoken language: $\\cap$ means \"and\", and $A\\cap B$ means $A$ *and* $B$, or all elements that are in both $A$ *and* $B$. Put another way, $A\\cap B$ is the set of all holidays during which banks are closed in both the United States *and* the United Kingdom. Likewise, every piece of set notation can be expressed semantically.\n",
    "\n",
    "Although Codd laid out the broad parameters of the relational model in mathematical terms, he did not design software or a physical architecture for a relational database. He explicitly left that work up to future research:\n",
    "\n",
    "> Many questions are raised and left unanswered. For example, only a few of the more important properties of the data sublanguage . . . are mentioned. Neither the purely linguistic details of such a language nor the implementation problems are discussed. Nevertheless, the material presented should be adequate for experienced systems programmers to visualize several approaches (p. 387).\n",
    "\n",
    "Chamberlin and Boyce took up the challenge of writing a programming language to implement Codd's relational model. [As Chamberlin explains](https://ieeexplore.ieee.org/document/6359709), their primary goal was to create a version of Codd's set-theoretical relational model that could be expressed in plain language:\n",
    "\n",
    "> The more difficult barrier was at the semantic level. The basic concepts of Codd's languages were adapted from set theory and symbolic logic. This was natural given Codd's background as a mathematician, but Ray and I hoped to design a relational language based on concepts that would be familiar to a wider population of users (p. 78).\n",
    "\n",
    "In short, the idea behind SQL is to implement Codd's abstract system of using logical statements with set theoretical notation to narrow down the specific records and features in a dataset that a user wishes to read or edit, but to phrase these operations in accessable, plain language. One of the best things about SQL is that once you are used to the language, it reads just like English. That said, Chamberlin admits that SQL \"has not proved to be as accessible to untrained users as Ray and I originally hoped\" ([p. 81](https://ieeexplore.ieee.org/document/6359709)).\n",
    "\n",
    "Another benefit of SQL is that this language is one of the most universal programming languages in existence. It is designed to work with database management systems on any platform, and it works seamlessly within Python, R, C, Java, Javascript, and so on. While the standards for languages and platforms change, SQL has been in continuous use for relational database management since the 1980s and shows no sign of becoming antiquated or being replaced. The SQL syntax exists outside of any individual DBMS, and is maintained by the [American National Standards Institute (ANSI)](https://en.wikipedia.org/wiki/American_National_Standards_Institute) and the [International Organization for Standardization (ISO)](https://en.wikipedia.org/wiki/International_Organization_for_Standardization), two non-profit organizations that facilitate the development of [voluntary consensus standards](https://en.wikipedia.org/wiki/Standardization) for things like programming languages and hardware. Despite the universality of SQL, however, different DBMSs use slightly different versions of SQL, adding some unique functionality in some cases, and failing to implement the entire SQL standard in others. MySQL for example [lacks the ability to perform a full join](https://stackoverflow.com/questions/4796872/how-to-do-a-full-outer-join-in-mysql). PostgreSQL distinguishes itself from other RDBMSs by striving to implement as much of the global SQL standard as possible. While there some important differences in the version of SQL used by different DBMSs, the differences generally apply to very specific situations and all implementations of SQL use mostly the same syntax and can do mostly the same work. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Declarative and Procedural Languages\n",
    "\n",
    "SQL is considered to be a [declarative language](https://en.wikipedia.org/wiki/Declarative_programming), which means that it defines the broad task that a particular computer system must carry out, but it does not define the mechanism through which the system completes the task. For example, SQL can tell a system to access two tables and join them together, but that command must tell a DBMS to access additional code that tells the system how exactly to search and operate on the rows and columns of each data table. A language that provides specific instructions to a system on how to carry out a task - by changing the system state in some way, including how the data exist in the system - is a [procedural language](https://en.wikipedia.org/wiki/Procedural_programming). The code that a procedural language uses to make these changes on the system is called [imperative code](https://en.wikipedia.org/wiki/Imperative_programming). A DBMS can be thought of as a function that takes declarative SQL code as an input, finds and runs the imperative code that carries out the declarative task, and returns the output. MySQL, for example, [uses imperative C and C++ code](https://dev.mysql.com/doc/refman/8.0/en/features.html) to carry out SQL queries."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Popularity of SQL\n",
    "Common standards and the most popular programming languages and environments change all the time. It's an eternal struggle for data scientists as well as programmers of all kinds, and a matter of consistent anxiety. Presently, Python is the most widely used tool for data science, but will we all have to drop Python soon and [teach ourselves Julia](https://towardsdatascience.com/bye-bye-python-hello-julia-9230bff0df62)? \n",
    "\n",
    "In this context, it is stunning that SQL has been so widely used since the 1970s. According to a [Stack Overflow survey](https://insights.stackoverflow.com/survey/2017), SQL is the one of most widely used programming languages among the people who filled out the survey, second only to Javascript. Taking into account the high-tech biases in this specific sample, it is probably the case the SQL is more widely used than any other language mentioned in this survey. What accounts for this popularity?\n",
    "\n",
    "This [blog post](https://blog.sqlizer.io/posts/sql-43/) argues that SQL achieved this level of longevity because it came to prominence during a time in which many of the baseline standards for the development of computer systems were being invented. As more and more systems were developed in a way that depends on SQL, it became harder to change this standard. But SQL is also simple and highly functional because it is a semantic language that expresses set-theoetical and logical operations. As long as relational databases are used, there's not much functionality that can be added to a query language beyond these foundational mathematical operations, and whatever additional functionality is needed can be added to a version of SQL by a particular DBMS. There are also many different open source and proprietary DBMSs that all employ SQL, so different users can have a choice over many different DBMSs and platforms without having to learn a query language other than SQL.\n",
    "\n",
    "That said, there's much less of a reason to use SQL when the database is not organized according to the relational model. NoSQL databases have much more flexible schema in general, and can store the data in one big table or in as many tables as there are records, or even datapoints, in the database. In fact, without a relational schema, the notion of a data table makes less sense in general. For example, a document store is a collection of individual records encoded using JSON or XML, and not as tables. These records can be sharded: stored in many corresponding servers in a distributed system to address challenges with the size of the database and the speed with which database transactions are conducted. Without tables, NoSQL DBMSs do not usually use SQL. MongoDB, for example, works with queries that are themselves in JSON format."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create, Read, Update, and Delete (CRUD) Operations\n",
    "**Persistent storage** refers to a system in which [data outlives the process that created it](https://en.wikipedia.org/wiki/Persistence_(computer_science)). When you work with software that allows you to save a file, the file is stored in persistent storage because it still exists even after you close the software application. Hard drives are examples of persistent storage, as are local and remote servers that store databases. Any persistent storage mechanism must have methods for creating, reading (or loading), updating (or editing), and deleting the data in that storage device. Create, Read, Update, and Delete are the CRUD Operations.\n",
    "\n",
    "We've previously employed CRUD operations using the `requests` library to use an API or to do web scraping. Like `requests`, SQL and other query languages have CRUD operations. The following table, adapted from a similar one that appears on the [Wikipedia page for the CRUD operations](https://en.wikipedia.org/wiki/Create%2C_read%2C_update_and_delete), shows these operations in the `requests` package, SQL, and the MongoDB query language:\n",
    "\n",
    "|Operation|`requests` method|SQL|MongoDB|\n",
    "|:-|:-|:-|:-|\n",
    "|Create|`requests.put()` or `requests.post()`|`INSERT`|`Insert`|\n",
    "|Read|`requests.get()`|`SELECT`|`Find`|\n",
    "|Update|`requests.patch()`|`UPDATE`|`Update`|\n",
    "|Delete|`requests.delete()`|`DELETE`|`Remove`|\n",
    "\n",
    "As a data scientist, you will most often use read operations to obtain the data you need for your analysis. However, if you are collecting original data for your project, the create, update, and delete operations become much more important. We will discuss all four operations and their variants in the context of SQL and MongoDB below.\n",
    "\n",
    "We can work with SQL using `pandas` if we first create an engine that links to a specific DBMS, server, and database with `create_engine` from `sqlalchemy`. Once we do, the `pd.read_sql_query()` function makes read operations straightforward, and the `.execute()` method applied to the engine lets us easily issue create, update, and delete commands."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Style: Capitalization, Quotes, New Lines, Indentation\n",
    "There are many ways to write an SQL query, and when you look at someone else's SQL code you will see a variety of styles. Mostly, with the exception of quotes in some cases, stylistic differences don't change the behavior of the code, but they can have an impact on how easy the code is for other people to read and understand.\n",
    "\n",
    "One requirement for SQL code is that the query must end with a semi-colon, and that no semi-colons appear elsewhere in the query. As long as that requirement is met, other stylistic choices are possible.\n",
    "\n",
    "The least readable way to write an SQL query is to write the entire code on one line, with no capitalization or indentation. The following code is valid SQL code:\n",
    "```\n",
    "select t.id, t.column1, t.column2, t.column3, r.column4 from table1 t inner join table2 r on t.id = r.id where column1>100 order by column2 desc;\n",
    "```\n",
    "We will discuss exactly what this query does. But for now, let's focus on the presentation of code. SQL uses **clauses** to represent particular functions for reading and writing data. In the above query, `select`, `from`, `inner join`, `on`, `where`, and `order by` are all clauses, and `desc` is an option applied to the `order by` clause. \n",
    "\n",
    "One stylistic choice many people make is to write SQL clauses in capital letters. That helps readers to quickly see the parts of the code that are clauses as opposed to the rest of the code that contains column names, table names, values, and aliases. If we capitalize the clauses and options in the SQL query, it looks like this:\n",
    "```\n",
    "SELECT t.id, t.column1, t.column2, t.column3, r.column4 FROM table1 t INNER JOIN table2 r ON t.id = r.id WHERE column1>100 ORDER BY column2 DESC;\n",
    "```\n",
    "Another stylistic choice people make to present the code in a more reabable way is to put clauses that are considered distinct enough from other clauses on new lines. The one common exception is `FROM`, which is considered to be closely related to `SELECT` and is often written on the same line as `SELECT`. If we put each clause other than `FROM` on a new line, the query looks like this:\n",
    "```\n",
    "SELECT t.id, t.column1, t.column2, t.column3, r.column4 FROM table1 t \n",
    "INNER JOIN table2 r \n",
    "ON t.id = r.id \n",
    "WHERE column1>100 \n",
    "ORDER BY column2 DESC;\n",
    "```\n",
    "Some clauses are considered to be elaborations upon a previous clause. Column names after `SELECT` are usually written on the same line as `SELECT`, but if these columns themselves require functions that take up more space, it is useful to put them on new lines. Likewise, `ON` is considered an elaboration of `INNER JOIN`. These lines of code are often indented to express the dependence on the previous line. If we include indentation in the code, the query is\n",
    "```\n",
    "SELECT \n",
    "    t.id, \n",
    "    t.column1, \n",
    "    t.column2, \n",
    "    t.column3, \n",
    "    r.column4 \n",
    "FROM table1 t \n",
    "INNER JOIN table2 r \n",
    "    ON t.id = r.id \n",
    "WHERE column1>100 \n",
    "ORDER BY column2 DESC;\n",
    "```\n",
    "I encourage you to develop good habits with how you write the SQL queries, both for other people to read your code, but more importantly, to make it easier for you yourself to read your code. You will be spending a lot of time developing and debugging SQL queries, and anything you do that cuts down the time to understand your own code will save you a lot of time and frustration in the long-run.\n",
    "\n",
    "Quotes are only used in SQL code when referring to values of a character feature in one of the data tables. When using quotes while working in Python, it is important to use single quotes, not double quotes, to ensure that the quote that is internal to SQL is not read as a termination of the Python variable that contains the SQL code. That is, it is fine to write a clause that looks like `WHERE column = 'value'`, but not `WHERE column = \"value\"`.\n",
    "\n",
    "For all of the queries we will write in the following examples, we will store the query as a string variable in Python. We will use the triple-quote syntax, which allows us to write a string that exists on multiple lines. So our SQL query definitions will look like this:\n",
    "```\n",
    "myquery = \"\"\"\n",
    "SELECT \n",
    "    t.id, \n",
    "    t.column1, \n",
    "    t.column2, \n",
    "    t.column3, \n",
    "    r.column4 \n",
    "FROM table1 t \n",
    "INNER JOIN table2 r \n",
    "    ON t.id = r.id \n",
    "WHERE column1>100 \n",
    "ORDER BY column2 DESC;\n",
    "\"\"\"\n",
    "```\n",
    "We will then be able to pass the `myquery` variable to functions like `pd.read_sql_query()` to be evaluated."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Joins\n",
    "The simplest SQL commands for reading data from a database are `SELECT` and `FROM`. In module 6, we issued the following query to read the entire \"reviews\" entity from the wine reviews database:\n",
    "```\n",
    "SELECT * FROM reviews\n",
    "```\n",
    "But this query does not read data from the other entities in the database. It pulls all of the rows and columns from \"reviews\" and it does not manipulate the data within \"reviews\" in any way. That might not be the best way to create a dataframe to conduct analyses.\n",
    "\n",
    "SQL read operations get data, but they can also clean data at the same time. Cleaning data is an important challenge. Even when the data are stored in a well-organized database, that organization might not be the right format for the data given the analyses we intend to do. \"[Tidy Data](https://www.jstatsoft.org/article/view/v059i10)\" by Hadley Wickham lays out a philosophy of what it means to clean a dataset. The goal is to put data into a format in which modeling and visualization is as easy as possible. There are two steps: first we create **tidy data** and then we manipulate the data to fit our specific needs. Tidy data is defined as follows:\n",
    "\n",
    "> A dataset is messy or tidy depending on how rows, columns and tables are matched up with observations, [features] and types. In tidy data:\n",
    "> 1. Each [feature] forms a column.\n",
    "> 2. Each observation forms a row.\n",
    "> 3. Each type of observational unit forms a table (p. 2).\n",
    "\n",
    "In other words, the dataset we will use in an analysis must exist in one table, the rows of the table must represent records (also called observations), the columns of the table must represent features, and the rows must represent comparable units. For example, if the data contain records from the 50 U.S. States, there should be a row for each state, and there should not be a row for regions or for the whole country as these units are not comparable to states. Data from a relational database are not generally in tidy format because the relevant data exists in multiple tables. The first step is to combine these tables into one dataset using **join** functions within SQL read operations. There are many different kinds of joins, and the easiest way to see the difference between these types is to see what they do to real data. \n",
    "\n",
    "Before we discuss specific examples of how to use SQL, we load the following libraries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "import sys\n",
    "import os\n",
    "import psycopg2\n",
    "from sqlalchemy import create_engine\n",
    "import dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example Database: NFL and NBA Teams\n",
    "As an example, I create a PostgreSQL database that contains two tables: \"nfl\" contains the location and team name of all 32 NFL teams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>[New York Jets, New York Giants]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>Cincinnati Bengals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Pittsburgh</td>\n",
       "      <td>Pittsburgh Steelers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Baltimore</td>\n",
       "      <td>Baltimore Ravens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Kansas City</td>\n",
       "      <td>Kansas City Chiefs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Las Vegas</td>\n",
       "      <td>Las Vegas Raiders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>[L.A. Chargers, L.A. Rams]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Nashville</td>\n",
       "      <td>Tennessee Titans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Jacksonville</td>\n",
       "      <td>Jacksonville Jaguars</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Tampa Bay</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Seahawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Green Bay Packers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                      footballteam\n",
       "0         Buffalo                     Buffalo Bills\n",
       "1           Miami                    Miami Dolphins\n",
       "2          Boston              New England Patriots\n",
       "3        New York  [New York Jets, New York Giants]\n",
       "4       Cleveland                  Cleveland Browns\n",
       "5      Cincinnati                Cincinnati Bengals\n",
       "6      Pittsburgh               Pittsburgh Steelers\n",
       "7       Baltimore                  Baltimore Ravens\n",
       "8     Kansas City                Kansas City Chiefs\n",
       "9       Las Vegas                 Las Vegas Raiders\n",
       "10    Los Angeles        [L.A. Chargers, L.A. Rams]\n",
       "11         Denver                    Denver Broncos\n",
       "12      Nashville                  Tennessee Titans\n",
       "13   Jacksonville              Jacksonville Jaguars\n",
       "14        Houston                    Houston Texans\n",
       "15   Indianapolis                Indianapolis Colts\n",
       "16   Philadelphia               Philadelphia Eagles\n",
       "17         Dallas                    Dallas Cowboys\n",
       "18     Washington                  Washington Skins\n",
       "19        Atlanta                   Atlanta Falcons\n",
       "20      Charlotte                 Carolina Panthers\n",
       "21      Tampa Bay              Tampa Bay Buccaneers\n",
       "22    New Orleans                New Orleans Saints\n",
       "23  San Francisco               San Francisco 49ers\n",
       "24        Phoenix                 Arizona Cardinals\n",
       "25        Seattle                  Seattle Seahawks\n",
       "26        Chicago                     Chicago Bears\n",
       "27      Green Bay                 Green Bay Packers\n",
       "28    Minneapolis                 Minnesota Vikings\n",
       "29        Detroit                     Detroit Lions"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nfl_dict = {'city':['Buffalo','Miami','Boston','New York','Cleveland','Cincinnati',\n",
    "                       'Pittsburgh','Baltimore','Kansas City','Las Vegas','Los Angeles','Denver',\n",
    "                       'Nashville','Jacksonville','Houston','Indianapolis','Philadelphia','Dallas',\n",
    "                       'Washington','Atlanta','Charlotte','Tampa Bay','New Orleans','San Francisco',\n",
    "                       'Phoenix', 'Seattle','Chicago','Green Bay','Minneapolis','Detroit'],\n",
    "           'footballteam':['Buffalo Bills','Miami Dolphins','New England Patriots',\n",
    "                           ['New York Jets', 'New York Giants'],'Cleveland Browns','Cincinnati Bengals',\n",
    "                          'Pittsburgh Steelers','Baltimore Ravens','Kansas City Chiefs',\n",
    "                           'Las Vegas Raiders',['L.A. Chargers','L.A. Rams'],'Denver Broncos',\n",
    "                          'Tennessee Titans','Jacksonville Jaguars','Houston Texans',\n",
    "                           'Indianapolis Colts','Philadelphia Eagles','Dallas Cowboys',\n",
    "                           'Washington Skins','Atlanta Falcons','Carolina Panthers',\n",
    "                           'Tampa Bay Buccaneers','New Orleans Saints', 'San Francisco 49ers',\n",
    "                           'Arizona Cardinals','Seattle Seahawks','Chicago Bears',\n",
    "                           'Green Bay Packers','Minnesota Vikings','Detroit Lions']}\n",
    "nfl_df = pd.DataFrame(nfl_dict)\n",
    "nfl_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This table is not in first normal form because the data are non-atomic (two teams from New York and two in Los Angeles), but this form is useful for illustrating what different SQL joins do. The second table contains the same information about NBA teams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>Brooklyn Nets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Bucks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Orlando</td>\n",
       "      <td>Orlando Magic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>[L.A. Lakers, L.A. Clippers]</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Portland</td>\n",
       "      <td>Portland Trailblazers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Sacramento</td>\n",
       "      <td>Sacramento Kings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>San Antonio</td>\n",
       "      <td>San Antonio Spurs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Oklahoma City</td>\n",
       "      <td>Oklahoma City Thunder</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Salt Lake City</td>\n",
       "      <td>Utah Jazz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Memphis</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              city                basketballteam\n",
       "0           Boston                Boston Celtics\n",
       "1         New York               New York Knicks\n",
       "2     Philadelphia            Philadelphia 76ers\n",
       "3         Brooklyn                 Brooklyn Nets\n",
       "4          Toronto               Toronto Raptors\n",
       "5        Cleveland           Cleveland Cavaliers\n",
       "6          Chicago                 Chicago Bulls\n",
       "7          Detroit               Detroit Pistons\n",
       "8        Milwaukee               Milwaukee Bucks\n",
       "9     Indianapolis                Indiana Pacers\n",
       "10         Atlanta                 Atlanta Hawks\n",
       "11      Washington            Washington Wizards\n",
       "12         Orlando                 Orlando Magic\n",
       "13           Miami                    Miami Heat\n",
       "14       Charlotte             Charlotte Hornets\n",
       "15     Los Angeles  [L.A. Lakers, L.A. Clippers]\n",
       "16   San Francisco         Golden State Warriors\n",
       "17        Portland         Portland Trailblazers\n",
       "18      Sacramento              Sacramento Kings\n",
       "19         Phoenix                  Phoenix Suns\n",
       "20     San Antonio             San Antonio Spurs\n",
       "21          Dallas              Dallas Mavericks\n",
       "22         Houston               Houston Rockets\n",
       "23   Oklahoma City         Oklahoma City Thunder\n",
       "24     Minneapolis        Minnesota Timberwolves\n",
       "25          Denver                Denver Nuggets\n",
       "26  Salt Lake City                     Utah Jazz\n",
       "27         Memphis             Memphis Grizzlies\n",
       "28     New Orleans          New Orleans Pelicans"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "nba_dict = {'city':['Boston','New York','Philadelphia','Brooklyn','Toronto',\n",
    "                   'Cleveland','Chicago','Detroit','Milwaukee','Indianapolis',\n",
    "                   'Atlanta', 'Washington','Orlando','Miami','Charlotte',\n",
    "                   'Los Angeles','San Francisco','Portland','Sacramento',\n",
    "                   'Phoenix','San Antonio','Dallas','Houston','Oklahoma City',\n",
    "                   'Minneapolis','Denver','Salt Lake City','Memphis','New Orleans'],\n",
    "           'basketballteam':['Boston Celtics','New York Knicks','Philadelphia 76ers',\n",
    "                             'Brooklyn Nets','Toronto Raptors',\n",
    "                            'Cleveland Cavaliers','Chicago Bulls','Detroit Pistons',\n",
    "                             'Milwaukee Bucks','Indiana Pacers',\n",
    "                            'Atlanta Hawks','Washington Wizards','Orlando Magic',\n",
    "                             'Miami Heat','Charlotte Hornets',\n",
    "                            ['L.A. Lakers','L.A. Clippers'],'Golden State Warriors',\n",
    "                             'Portland Trailblazers','Sacramento Kings',\n",
    "                            'Phoenix Suns','San Antonio Spurs','Dallas Mavericks',\n",
    "                             'Houston Rockets','Oklahoma City Thunder',\n",
    "                            'Minnesota Timberwolves','Denver Nuggets',\n",
    "                             'Utah Jazz','Memphis Grizzlies','New Orleans Pelicans']}\n",
    "nba_df = pd.DataFrame(nba_dict)\n",
    "nba_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To create a PostgreSQL database with entities for the NFL and NBA teams, I first connect to the PostgreSQL server running on my local computer (see module 6 for a more detailed discussion of how this works). First I bring my PostgreSQL password into the local environment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "dotenv.load_dotenv()\n",
    "pgpassword = os.getenv(\"pgpassword\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then I access the server and establish a cursor for the server:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "dbserver = psycopg2.connect(\n",
    "    user='jk8sd', \n",
    "    password=pgpassword, \n",
    "    host=\"localhost\"\n",
    ")\n",
    "dbserver.autocommit = True\n",
    "cursor = dbserver.cursor()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I create an empty \"teams\" database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "try:\n",
    "    cursor.execute(\"CREATE DATABASE teams\")\n",
    "except:\n",
    "    cursor.execute(\"DROP DATABASE teams\")\n",
    "    cursor.execute(\"CREATE DATABASE teams\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And I use the `create_engine()` function from `sqalchemy` to allow queries to the \"teams\" database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "engine = create_engine(\"postgresql+psycopg2://{user}:{pw}@localhost/{db}\"\n",
    "                       .format(user=\"jk8sd\", pw=pgpassword, db=\"teams\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I add the `nfl_df` and `nba_df` dataframes to the \"teams\" database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "nfl_df.to_sql('nfl', con = engine, index=False, chunksize=1000, if_exists = 'replace')\n",
    "nba_df.to_sql('nba', con = engine, index=False, chunksize=1000, if_exists = 'replace')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I can now issue queries to the database. To read all of the data in the NFL table, for example, I type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>Cincinnati Bengals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Pittsburgh</td>\n",
       "      <td>Pittsburgh Steelers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Baltimore</td>\n",
       "      <td>Baltimore Ravens</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Kansas City</td>\n",
       "      <td>Kansas City Chiefs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Las Vegas</td>\n",
       "      <td>Las Vegas Raiders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Nashville</td>\n",
       "      <td>Tennessee Titans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Jacksonville</td>\n",
       "      <td>Jacksonville Jaguars</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Tampa Bay</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Seahawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Green Bay Packers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam\n",
       "0         Buffalo                        Buffalo Bills\n",
       "1           Miami                       Miami Dolphins\n",
       "2          Boston                 New England Patriots\n",
       "3        New York  {\"New York Jets\",\"New York Giants\"}\n",
       "4       Cleveland                     Cleveland Browns\n",
       "5      Cincinnati                   Cincinnati Bengals\n",
       "6      Pittsburgh                  Pittsburgh Steelers\n",
       "7       Baltimore                     Baltimore Ravens\n",
       "8     Kansas City                   Kansas City Chiefs\n",
       "9       Las Vegas                    Las Vegas Raiders\n",
       "10    Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}\n",
       "11         Denver                       Denver Broncos\n",
       "12      Nashville                     Tennessee Titans\n",
       "13   Jacksonville                 Jacksonville Jaguars\n",
       "14        Houston                       Houston Texans\n",
       "15   Indianapolis                   Indianapolis Colts\n",
       "16   Philadelphia                  Philadelphia Eagles\n",
       "17         Dallas                       Dallas Cowboys\n",
       "18     Washington                     Washington Skins\n",
       "19        Atlanta                      Atlanta Falcons\n",
       "20      Charlotte                    Carolina Panthers\n",
       "21      Tampa Bay                 Tampa Bay Buccaneers\n",
       "22    New Orleans                   New Orleans Saints\n",
       "23  San Francisco                  San Francisco 49ers\n",
       "24        Phoenix                    Arizona Cardinals\n",
       "25        Seattle                     Seattle Seahawks\n",
       "26        Chicago                        Chicago Bears\n",
       "27      Green Bay                    Green Bay Packers\n",
       "28    Minneapolis                    Minnesota Vikings\n",
       "29        Detroit                        Detroit Lions"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"SELECT * FROM nfl\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Types of Joins\n",
    "Joining data tables is the act of *adding columns* to an existing data table - that is, adding more features to existing records - by matching the rows in one table to the corresponding rows in another table. In a relational database, data tables can include a foreign key which serves as the primary key for another data table. Joins require matching a foreign key in one table to the corresponding primary key in another table. During the join, this foreign key and this primary key are both called **indices**. To perform a join with an SQL query, we specify the two tables in the database we want to join and the index in each table we will match on.\n",
    "\n",
    "In the teams database, `city` is a primary key in both the \"nfl\" and \"nba\" tables, which also makes it a foreign key in both tables. Joining the \"nfl\" and \"nba\" tables by matching on `city` creates one data table in which the rows still represent cities and the columns list both the NBA and NFL teams in each city. \n",
    "\n",
    "Not every city has both an NFL and an NBA team. Green Bay, for example, has a football team but no basketball team, and Sacramento has a basketball team but no football team. In a join, every row in a table either **matches** with one or more rows in the other table, or is **unmatched**. In this case, Cleveland in the NFL table is matched to a row in the NBA table because Cleveland has both a football and basketball team, but Oklahoma City in the NBA table is unmatched because there is no row for Oklahoma City in the NFL table.\n",
    "\n",
    "The main difference between types of joins in SQL is their treatment of unmatched records. The following table summarizes the types of joins:\n",
    "\n",
    "| Type of join | Definition                                                                                                                                  |\n",
    "|--------------|---------------------------------------------------------------------------------------------------------------------------------------------\n",
    "| Inner join   | Only keep the records that exist in both tables|\n",
    "| Left join    | Keep all the records in the first table listed, and keep only the records in the second table listed that have matches in the first table   |\n",
    "| Right join   | Keep all the records in the second table listed, and keep only the  records in the first table listed that have matches in the second table |                     \n",
    "| Full join    | Keep all of the records in both tables whether they are matched or not                                                                      |                     \n",
    "| Anti join    | Keep only the records in the first table that are not matched in the second table                                                           |                     \n",
    "|Natural join | The same as any of the joins listed above, but no need to specify the indices as these are determined automatically by finding columns with the same name. If no columns share the same name, a natural join performs a cross join. If more than one pair of columns share names across the two data tables, natural joins assume that both are part of the index to match on. Use caution.|\n",
    "|Cross join | Also called a **Cartesian product**. If the first dataframe has $M$ rows and the second dataframe has $N$ rows, the result has $M\\times N$ rows. Every row is a pairwise combination of values of each index.|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Inner Joins\n",
    "The syntax for an inner join is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "INNER JOIN table2\n",
    "    ON table1.index_name = table2.index_name;\n",
    "```\n",
    "where `table1` and `table2` are the data tables we are joining, and `table1.index_name` and `table2.index_name` are the columns that contain the indices for tables 1 and 2. Alternatively, inner join is the default type of join, so that this syntax\n",
    "```\n",
    "SELECT * FROM table1\n",
    "JOIN table2\n",
    "    ON table1.index_name = table2.index_name;\n",
    "```\n",
    "also produces an inner join. I recommend typing `INNER JOIN`, however, to avoid confusing this type of join with other types.\n",
    "\n",
    "In the case of the teams database, an inner join of the NFL and NBA tables yields a dataframe with one row for every city that has both a basketball and a football team. The SQL query that generates this data frame is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam           city  \\\n",
       "0           Miami                       Miami Dolphins          Miami   \n",
       "1          Boston                 New England Patriots         Boston   \n",
       "2        New York  {\"New York Jets\",\"New York Giants\"}       New York   \n",
       "3       Cleveland                     Cleveland Browns      Cleveland   \n",
       "4     Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}    Los Angeles   \n",
       "5          Denver                       Denver Broncos         Denver   \n",
       "6         Houston                       Houston Texans        Houston   \n",
       "7    Indianapolis                   Indianapolis Colts   Indianapolis   \n",
       "8    Philadelphia                  Philadelphia Eagles   Philadelphia   \n",
       "9          Dallas                       Dallas Cowboys         Dallas   \n",
       "10     Washington                     Washington Skins     Washington   \n",
       "11        Atlanta                      Atlanta Falcons        Atlanta   \n",
       "12      Charlotte                    Carolina Panthers      Charlotte   \n",
       "13    New Orleans                   New Orleans Saints    New Orleans   \n",
       "14  San Francisco                  San Francisco 49ers  San Francisco   \n",
       "15        Phoenix                    Arizona Cardinals        Phoenix   \n",
       "16        Chicago                        Chicago Bears        Chicago   \n",
       "17    Minneapolis                    Minnesota Vikings    Minneapolis   \n",
       "18        Detroit                        Detroit Lions        Detroit   \n",
       "\n",
       "                     basketballteam  \n",
       "0                        Miami Heat  \n",
       "1                    Boston Celtics  \n",
       "2                   New York Knicks  \n",
       "3               Cleveland Cavaliers  \n",
       "4   {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "5                    Denver Nuggets  \n",
       "6                   Houston Rockets  \n",
       "7                    Indiana Pacers  \n",
       "8                Philadelphia 76ers  \n",
       "9                  Dallas Mavericks  \n",
       "10               Washington Wizards  \n",
       "11                    Atlanta Hawks  \n",
       "12                Charlotte Hornets  \n",
       "13             New Orleans Pelicans  \n",
       "14            Golden State Warriors  \n",
       "15                     Phoenix Suns  \n",
       "16                    Chicago Bulls  \n",
       "17           Minnesota Timberwolves  \n",
       "18                  Detroit Pistons  "
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl\n",
    "INNER JOIN nba\n",
    "    ON nfl.city = nba.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The three quotations that come before and after the SQL code is Python syntax that allow for a string to be entered on multiple lines. With just one quote, Python would assume that the next line should be read as Python code, and will produce an error. Three quotes allows us to space out the components of the SQL query on separate lines to make the SQL code easier to read and understand.\n",
    "\n",
    "SQL queries can be written on multiple lines, but the last line (and only the last line) must conclude with a semicolon.\n",
    "\n",
    "Another way to write the inner join query is to use **aliasing**: specifying a smaller name or a single letter next to each data table in the query to simplify the syntax for `ON`. For example, I can alias the NFL data with `f` and the NBA data with `b`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam           city  \\\n",
       "0           Miami                       Miami Dolphins          Miami   \n",
       "1          Boston                 New England Patriots         Boston   \n",
       "2        New York  {\"New York Jets\",\"New York Giants\"}       New York   \n",
       "3       Cleveland                     Cleveland Browns      Cleveland   \n",
       "4     Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}    Los Angeles   \n",
       "5          Denver                       Denver Broncos         Denver   \n",
       "6         Houston                       Houston Texans        Houston   \n",
       "7    Indianapolis                   Indianapolis Colts   Indianapolis   \n",
       "8    Philadelphia                  Philadelphia Eagles   Philadelphia   \n",
       "9          Dallas                       Dallas Cowboys         Dallas   \n",
       "10     Washington                     Washington Skins     Washington   \n",
       "11        Atlanta                      Atlanta Falcons        Atlanta   \n",
       "12      Charlotte                    Carolina Panthers      Charlotte   \n",
       "13    New Orleans                   New Orleans Saints    New Orleans   \n",
       "14  San Francisco                  San Francisco 49ers  San Francisco   \n",
       "15        Phoenix                    Arizona Cardinals        Phoenix   \n",
       "16        Chicago                        Chicago Bears        Chicago   \n",
       "17    Minneapolis                    Minnesota Vikings    Minneapolis   \n",
       "18        Detroit                        Detroit Lions        Detroit   \n",
       "\n",
       "                     basketballteam  \n",
       "0                        Miami Heat  \n",
       "1                    Boston Celtics  \n",
       "2                   New York Knicks  \n",
       "3               Cleveland Cavaliers  \n",
       "4   {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "5                    Denver Nuggets  \n",
       "6                   Houston Rockets  \n",
       "7                    Indiana Pacers  \n",
       "8                Philadelphia 76ers  \n",
       "9                  Dallas Mavericks  \n",
       "10               Washington Wizards  \n",
       "11                    Atlanta Hawks  \n",
       "12                Charlotte Hornets  \n",
       "13             New Orleans Pelicans  \n",
       "14            Golden State Warriors  \n",
       "15                     Phoenix Suns  \n",
       "16                    Chicago Bulls  \n",
       "17           Minnesota Timberwolves  \n",
       "18                  Detroit Pistons  "
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "INNER JOIN nba b\n",
    "    ON f.city = b.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The two indices we match on do not necessarily have to have the same name. Supposing that the \"city\" column in each data table was named \"location\" in the NFL table and \"town\" in the NBA table, the syntax for the inner join would have been:\n",
    "```\n",
    "SELECT * FROM nfl f\n",
    "INNER JOIN nba b\n",
    "    ON f.location = b.town;\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Left and Right Joins\n",
    "The syntax for a left join is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "LEFT JOIN table2\n",
    "    ON table1.index_name = table2.index_name;\n",
    "```\n",
    "and the syntax for a right join is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "RIGHT JOIN table2\n",
    "    ON table1.index_name = table2.index_name;\n",
    "```\n",
    "In the case of the teams database, if we list the NFL table next to `FROM` and the NBA data with the `JOIN` statement, then left join lists all of the cities with an NFL team, and also displays the NBA team in that city if one exists. Otherwise, the syntax places `None` in the cell where the NBA team would be. For the teams database, the syntax for a left join is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>Cincinnati Bengals</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Pittsburgh</td>\n",
       "      <td>Pittsburgh Steelers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Baltimore</td>\n",
       "      <td>Baltimore Ravens</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Kansas City</td>\n",
       "      <td>Kansas City Chiefs</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Las Vegas</td>\n",
       "      <td>Las Vegas Raiders</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Nashville</td>\n",
       "      <td>Tennessee Titans</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Jacksonville</td>\n",
       "      <td>Jacksonville Jaguars</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Tampa Bay</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Seahawks</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Green Bay Packers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam           city  \\\n",
       "0         Buffalo                        Buffalo Bills           None   \n",
       "1           Miami                       Miami Dolphins          Miami   \n",
       "2          Boston                 New England Patriots         Boston   \n",
       "3        New York  {\"New York Jets\",\"New York Giants\"}       New York   \n",
       "4       Cleveland                     Cleveland Browns      Cleveland   \n",
       "5      Cincinnati                   Cincinnati Bengals           None   \n",
       "6      Pittsburgh                  Pittsburgh Steelers           None   \n",
       "7       Baltimore                     Baltimore Ravens           None   \n",
       "8     Kansas City                   Kansas City Chiefs           None   \n",
       "9       Las Vegas                    Las Vegas Raiders           None   \n",
       "10    Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}    Los Angeles   \n",
       "11         Denver                       Denver Broncos         Denver   \n",
       "12      Nashville                     Tennessee Titans           None   \n",
       "13   Jacksonville                 Jacksonville Jaguars           None   \n",
       "14        Houston                       Houston Texans        Houston   \n",
       "15   Indianapolis                   Indianapolis Colts   Indianapolis   \n",
       "16   Philadelphia                  Philadelphia Eagles   Philadelphia   \n",
       "17         Dallas                       Dallas Cowboys         Dallas   \n",
       "18     Washington                     Washington Skins     Washington   \n",
       "19        Atlanta                      Atlanta Falcons        Atlanta   \n",
       "20      Charlotte                    Carolina Panthers      Charlotte   \n",
       "21      Tampa Bay                 Tampa Bay Buccaneers           None   \n",
       "22    New Orleans                   New Orleans Saints    New Orleans   \n",
       "23  San Francisco                  San Francisco 49ers  San Francisco   \n",
       "24        Phoenix                    Arizona Cardinals        Phoenix   \n",
       "25        Seattle                     Seattle Seahawks           None   \n",
       "26        Chicago                        Chicago Bears        Chicago   \n",
       "27      Green Bay                    Green Bay Packers           None   \n",
       "28    Minneapolis                    Minnesota Vikings    Minneapolis   \n",
       "29        Detroit                        Detroit Lions        Detroit   \n",
       "\n",
       "                     basketballteam  \n",
       "0                              None  \n",
       "1                        Miami Heat  \n",
       "2                    Boston Celtics  \n",
       "3                   New York Knicks  \n",
       "4               Cleveland Cavaliers  \n",
       "5                              None  \n",
       "6                              None  \n",
       "7                              None  \n",
       "8                              None  \n",
       "9                              None  \n",
       "10  {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "11                   Denver Nuggets  \n",
       "12                             None  \n",
       "13                             None  \n",
       "14                  Houston Rockets  \n",
       "15                   Indiana Pacers  \n",
       "16               Philadelphia 76ers  \n",
       "17                 Dallas Mavericks  \n",
       "18               Washington Wizards  \n",
       "19                    Atlanta Hawks  \n",
       "20                Charlotte Hornets  \n",
       "21                             None  \n",
       "22             New Orleans Pelicans  \n",
       "23            Golden State Warriors  \n",
       "24                     Phoenix Suns  \n",
       "25                             None  \n",
       "26                    Chicago Bulls  \n",
       "27                             None  \n",
       "28           Minnesota Timberwolves  \n",
       "29                  Detroit Pistons  "
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "LEFT JOIN nba b\n",
    "    ON f.city = b.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Likewise, the right join displays all the cities with an NBA team, along with the NFL team in that city, if one exists:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>Brooklyn Nets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Bucks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Orlando</td>\n",
       "      <td>Orlando Magic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Portland</td>\n",
       "      <td>Portland Trailblazers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Sacramento</td>\n",
       "      <td>Sacramento Kings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>San Antonio</td>\n",
       "      <td>San Antonio Spurs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Oklahoma City</td>\n",
       "      <td>Oklahoma City Thunder</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Salt Lake City</td>\n",
       "      <td>Utah Jazz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Memphis</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam            city  \\\n",
       "0          Boston                 New England Patriots          Boston   \n",
       "1        New York  {\"New York Jets\",\"New York Giants\"}        New York   \n",
       "2    Philadelphia                  Philadelphia Eagles    Philadelphia   \n",
       "3            None                                 None        Brooklyn   \n",
       "4            None                                 None         Toronto   \n",
       "5       Cleveland                     Cleveland Browns       Cleveland   \n",
       "6         Chicago                        Chicago Bears         Chicago   \n",
       "7         Detroit                        Detroit Lions         Detroit   \n",
       "8            None                                 None       Milwaukee   \n",
       "9    Indianapolis                   Indianapolis Colts    Indianapolis   \n",
       "10        Atlanta                      Atlanta Falcons         Atlanta   \n",
       "11     Washington                     Washington Skins      Washington   \n",
       "12           None                                 None         Orlando   \n",
       "13          Miami                       Miami Dolphins           Miami   \n",
       "14      Charlotte                    Carolina Panthers       Charlotte   \n",
       "15    Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}     Los Angeles   \n",
       "16  San Francisco                  San Francisco 49ers   San Francisco   \n",
       "17           None                                 None        Portland   \n",
       "18           None                                 None      Sacramento   \n",
       "19        Phoenix                    Arizona Cardinals         Phoenix   \n",
       "20           None                                 None     San Antonio   \n",
       "21         Dallas                       Dallas Cowboys          Dallas   \n",
       "22        Houston                       Houston Texans         Houston   \n",
       "23           None                                 None   Oklahoma City   \n",
       "24    Minneapolis                    Minnesota Vikings     Minneapolis   \n",
       "25         Denver                       Denver Broncos          Denver   \n",
       "26           None                                 None  Salt Lake City   \n",
       "27           None                                 None         Memphis   \n",
       "28    New Orleans                   New Orleans Saints     New Orleans   \n",
       "\n",
       "                     basketballteam  \n",
       "0                    Boston Celtics  \n",
       "1                   New York Knicks  \n",
       "2                Philadelphia 76ers  \n",
       "3                     Brooklyn Nets  \n",
       "4                   Toronto Raptors  \n",
       "5               Cleveland Cavaliers  \n",
       "6                     Chicago Bulls  \n",
       "7                   Detroit Pistons  \n",
       "8                   Milwaukee Bucks  \n",
       "9                    Indiana Pacers  \n",
       "10                    Atlanta Hawks  \n",
       "11               Washington Wizards  \n",
       "12                    Orlando Magic  \n",
       "13                       Miami Heat  \n",
       "14                Charlotte Hornets  \n",
       "15  {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "16            Golden State Warriors  \n",
       "17            Portland Trailblazers  \n",
       "18                 Sacramento Kings  \n",
       "19                     Phoenix Suns  \n",
       "20                San Antonio Spurs  \n",
       "21                 Dallas Mavericks  \n",
       "22                  Houston Rockets  \n",
       "23            Oklahoma City Thunder  \n",
       "24           Minnesota Timberwolves  \n",
       "25                   Denver Nuggets  \n",
       "26                        Utah Jazz  \n",
       "27                Memphis Grizzlies  \n",
       "28             New Orleans Pelicans  "
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "RIGHT JOIN nba b\n",
    "    ON f.city = b.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For the left and right joins, changing which data table appears along with `FROM` and which data table appears along with `JOIN` accomplishes the same thing as changing a left join to a right join. \n",
    "\n",
    "### Full (Outer) Join\n",
    "A full join, also called an outer join, keeps all of the records that exist in both tables, whether or not they are matched. Full joins will return a data frame with at least as many rows as the larger of the two data tables in the join because it contains all records that appear in either data frame. Most tutorials on SQL offer a warning about full joins that these queries can result in massive amounts of data being returned, and full joins are not implemented for MySQL databases. For systems like PostgreSQL in which full joins are allowed, the syntax for a full join is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "FULL JOIN table2\n",
    "    ON table1.index_name = table2.index_name;\n",
    "```\n",
    "For the teams database, a full join produces a data frame with one row for every city with an NFL team or an NBA team or both:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>Cincinnati Bengals</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Pittsburgh</td>\n",
       "      <td>Pittsburgh Steelers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Baltimore</td>\n",
       "      <td>Baltimore Ravens</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Kansas City</td>\n",
       "      <td>Kansas City Chiefs</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Las Vegas</td>\n",
       "      <td>Las Vegas Raiders</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Nashville</td>\n",
       "      <td>Tennessee Titans</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Jacksonville</td>\n",
       "      <td>Jacksonville Jaguars</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Tampa Bay</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Seahawks</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Green Bay Packers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Bucks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Oklahoma City</td>\n",
       "      <td>Oklahoma City Thunder</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Portland</td>\n",
       "      <td>Portland Trailblazers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>Brooklyn Nets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Sacramento</td>\n",
       "      <td>Sacramento Kings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Memphis</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>San Antonio</td>\n",
       "      <td>San Antonio Spurs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Salt Lake City</td>\n",
       "      <td>Utah Jazz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Orlando</td>\n",
       "      <td>Orlando Magic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam            city  \\\n",
       "0         Buffalo                        Buffalo Bills            None   \n",
       "1           Miami                       Miami Dolphins           Miami   \n",
       "2          Boston                 New England Patriots          Boston   \n",
       "3        New York  {\"New York Jets\",\"New York Giants\"}        New York   \n",
       "4       Cleveland                     Cleveland Browns       Cleveland   \n",
       "5      Cincinnati                   Cincinnati Bengals            None   \n",
       "6      Pittsburgh                  Pittsburgh Steelers            None   \n",
       "7       Baltimore                     Baltimore Ravens            None   \n",
       "8     Kansas City                   Kansas City Chiefs            None   \n",
       "9       Las Vegas                    Las Vegas Raiders            None   \n",
       "10    Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}     Los Angeles   \n",
       "11         Denver                       Denver Broncos          Denver   \n",
       "12      Nashville                     Tennessee Titans            None   \n",
       "13   Jacksonville                 Jacksonville Jaguars            None   \n",
       "14        Houston                       Houston Texans         Houston   \n",
       "15   Indianapolis                   Indianapolis Colts    Indianapolis   \n",
       "16   Philadelphia                  Philadelphia Eagles    Philadelphia   \n",
       "17         Dallas                       Dallas Cowboys          Dallas   \n",
       "18     Washington                     Washington Skins      Washington   \n",
       "19        Atlanta                      Atlanta Falcons         Atlanta   \n",
       "20      Charlotte                    Carolina Panthers       Charlotte   \n",
       "21      Tampa Bay                 Tampa Bay Buccaneers            None   \n",
       "22    New Orleans                   New Orleans Saints     New Orleans   \n",
       "23  San Francisco                  San Francisco 49ers   San Francisco   \n",
       "24        Phoenix                    Arizona Cardinals         Phoenix   \n",
       "25        Seattle                     Seattle Seahawks            None   \n",
       "26        Chicago                        Chicago Bears         Chicago   \n",
       "27      Green Bay                    Green Bay Packers            None   \n",
       "28    Minneapolis                    Minnesota Vikings     Minneapolis   \n",
       "29        Detroit                        Detroit Lions         Detroit   \n",
       "30           None                                 None       Milwaukee   \n",
       "31           None                                 None   Oklahoma City   \n",
       "32           None                                 None        Portland   \n",
       "33           None                                 None        Brooklyn   \n",
       "34           None                                 None      Sacramento   \n",
       "35           None                                 None         Memphis   \n",
       "36           None                                 None     San Antonio   \n",
       "37           None                                 None  Salt Lake City   \n",
       "38           None                                 None         Orlando   \n",
       "39           None                                 None         Toronto   \n",
       "\n",
       "                     basketballteam  \n",
       "0                              None  \n",
       "1                        Miami Heat  \n",
       "2                    Boston Celtics  \n",
       "3                   New York Knicks  \n",
       "4               Cleveland Cavaliers  \n",
       "5                              None  \n",
       "6                              None  \n",
       "7                              None  \n",
       "8                              None  \n",
       "9                              None  \n",
       "10  {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "11                   Denver Nuggets  \n",
       "12                             None  \n",
       "13                             None  \n",
       "14                  Houston Rockets  \n",
       "15                   Indiana Pacers  \n",
       "16               Philadelphia 76ers  \n",
       "17                 Dallas Mavericks  \n",
       "18               Washington Wizards  \n",
       "19                    Atlanta Hawks  \n",
       "20                Charlotte Hornets  \n",
       "21                             None  \n",
       "22             New Orleans Pelicans  \n",
       "23            Golden State Warriors  \n",
       "24                     Phoenix Suns  \n",
       "25                             None  \n",
       "26                    Chicago Bulls  \n",
       "27                             None  \n",
       "28           Minnesota Timberwolves  \n",
       "29                  Detroit Pistons  \n",
       "30                  Milwaukee Bucks  \n",
       "31            Oklahoma City Thunder  \n",
       "32            Portland Trailblazers  \n",
       "33                    Brooklyn Nets  \n",
       "34                 Sacramento Kings  \n",
       "35                Memphis Grizzlies  \n",
       "36                San Antonio Spurs  \n",
       "37                        Utah Jazz  \n",
       "38                    Orlando Magic  \n",
       "39                  Toronto Raptors  "
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "FULL JOIN nba b\n",
    "    ON f.city = b.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although there are 30 cities with at least one NFL team and 29 cities with at least one NBA team, there are 41 cities with at least one team from one of these two leagues."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Anti-Joins\n",
    "An anti-join leaves us with all of the records in the first data table that do not appear in the second table. There is no \"ANTI JOIN\" syntax in SQL, but the behavior of an anti-join can be generated by including the `WHERE` clause along with `LEFT JOIN`. The syntax for an anti-join is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "LEFT JOIN table2\n",
    "    ON table1.index_name = table2.index_name\n",
    "WHERE table2.index_name is NULL;\n",
    "```\n",
    "The `WHERE` statement is used to draw a selection of rows from a data table that make a specified logical condition true. After performing a left join we have a data table with all of the rows in the first table along with the data for those rows in the second table if the row had a match in the second table. Typing `WHERE table2.index_name is NULL` restricts this data table to only the rows that do not have a value of the index in the second table, meaning there was no match. For the teams database, the anti-join of the NFL and NBA tables yields a dataframe of all the cities with an NFL team but no NBA team:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>Cincinnati Bengals</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pittsburgh</td>\n",
       "      <td>Pittsburgh Steelers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Baltimore</td>\n",
       "      <td>Baltimore Ravens</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Kansas City</td>\n",
       "      <td>Kansas City Chiefs</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Las Vegas</td>\n",
       "      <td>Las Vegas Raiders</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Nashville</td>\n",
       "      <td>Tennessee Titans</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Jacksonville</td>\n",
       "      <td>Jacksonville Jaguars</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Tampa Bay</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Seahawks</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Green Bay Packers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            city          footballteam  city basketballteam\n",
       "0        Buffalo         Buffalo Bills  None           None\n",
       "1     Cincinnati    Cincinnati Bengals  None           None\n",
       "2     Pittsburgh   Pittsburgh Steelers  None           None\n",
       "3      Baltimore      Baltimore Ravens  None           None\n",
       "4    Kansas City    Kansas City Chiefs  None           None\n",
       "5      Las Vegas     Las Vegas Raiders  None           None\n",
       "6      Nashville      Tennessee Titans  None           None\n",
       "7   Jacksonville  Jacksonville Jaguars  None           None\n",
       "8      Tampa Bay  Tampa Bay Buccaneers  None           None\n",
       "9        Seattle      Seattle Seahawks  None           None\n",
       "10     Green Bay     Green Bay Packers  None           None"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "LEFT JOIN nba b\n",
    "    ON f.city = b.city\n",
    "WHERE b.city is NULL;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Natural Joins\n",
    "One annoying thing about all of the joins shown above is that we end up with two columns that contain the same information. In the case of the team database, we have two `city` columns that are always either equal, or else one is missing. But when one of the `city` columns says \"None\", the team from that table also says \"None\", so the missingness in the `city` column does not provide additional information.\n",
    "\n",
    "It might make sense to use a different kind of join that understands that the two `city` columns contain the same information and includes only one of these columns. A natural join does two things differently from the other joins described here:\n",
    "\n",
    "1. A natural join removes duplicated columns from the output data.\n",
    "\n",
    "2. A natural join detects the indices automatically by assuming columns that share the same name are part indices.\n",
    "\n",
    "If done correctly, a natural join saves some work constructing the query as the indices are detected automatically, and provides cleaner output. Any of the joins described above can be done as a natural join by adding `NATURAL` in front of `INNER`, `LEFT`, `RIGHT`, or `FULL`. If there are no columns that share the same name, a natural join instead performs a cross join (described below). \n",
    "\n",
    "The following query performs a natural inner join:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam  \\\n",
       "0           Miami                       Miami Dolphins   \n",
       "1          Boston                 New England Patriots   \n",
       "2        New York  {\"New York Jets\",\"New York Giants\"}   \n",
       "3       Cleveland                     Cleveland Browns   \n",
       "4     Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}   \n",
       "5          Denver                       Denver Broncos   \n",
       "6         Houston                       Houston Texans   \n",
       "7    Indianapolis                   Indianapolis Colts   \n",
       "8    Philadelphia                  Philadelphia Eagles   \n",
       "9          Dallas                       Dallas Cowboys   \n",
       "10     Washington                     Washington Skins   \n",
       "11        Atlanta                      Atlanta Falcons   \n",
       "12      Charlotte                    Carolina Panthers   \n",
       "13    New Orleans                   New Orleans Saints   \n",
       "14  San Francisco                  San Francisco 49ers   \n",
       "15        Phoenix                    Arizona Cardinals   \n",
       "16        Chicago                        Chicago Bears   \n",
       "17    Minneapolis                    Minnesota Vikings   \n",
       "18        Detroit                        Detroit Lions   \n",
       "\n",
       "                     basketballteam  \n",
       "0                        Miami Heat  \n",
       "1                    Boston Celtics  \n",
       "2                   New York Knicks  \n",
       "3               Cleveland Cavaliers  \n",
       "4   {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "5                    Denver Nuggets  \n",
       "6                   Houston Rockets  \n",
       "7                    Indiana Pacers  \n",
       "8                Philadelphia 76ers  \n",
       "9                  Dallas Mavericks  \n",
       "10               Washington Wizards  \n",
       "11                    Atlanta Hawks  \n",
       "12                Charlotte Hornets  \n",
       "13             New Orleans Pelicans  \n",
       "14            Golden State Warriors  \n",
       "15                     Phoenix Suns  \n",
       "16                    Chicago Bulls  \n",
       "17           Minnesota Timberwolves  \n",
       "18                  Detroit Pistons  "
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * from nfl\n",
    "NATURAL INNER JOIN nba\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Natural joins are controversial, however, and many data scientists choose not to use them at all. The danger is that if two columns unexpectedly have the same name (it can be hard to keep track of all of the features' names in big databases) then a natural join will match on the wrong indices. This [Stack Overflow post](https://stackoverflow.com/questions/8696383/difference-between-natural-join-and-inner-join#8696402) gets into this debate, and one response made a forceful argument against natural joins:\n",
    "\n",
    "> Collapsing columns in the output is the least-important aspect of a natural join. The things you need to know are (A) it automatically joins on fields of the same name and (B) it will f--- up your s--- when you least expect it. In my world, using a natural join is grounds for dismissal. . . . Say you have a natural join between `Customers` and `Employees`, joining on `EmployeeID`. Employees also has a `ManagerID` field. Everything's fine. Then, some day, someone adds a `ManagerID` field to the `Customers` table. Your join will not break (that would be a mercy), instead it will now include a second field, and work incorrectly. Thus, a seemingly harmless change can break something only distantly related. VERY BAD. The only upside of a natural join is saving a little typing, and the downside is substantial.\n",
    "\n",
    "Personally, I disagree with this statement as I think natural joins can be elegant and convenient, especially when I want to match on multiple indices. But I agree that natural joins do make it easier to mess up a join, and more caution is needed. To demonstrate how a natural join can go wrong, suppose that in both the NFL and NBA tables the columns were named `city` and `team`. The following code creates versions of these tables with `footballteam` and `basketballteam` each renamed to `team` and stores these tables in the database as \"nfl2\" and \"nba2\": "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "nfl2 = pd.read_sql_query(\"SELECT city, footballteam as team FROM nfl;\", con=engine)\n",
    "nba2 = pd.read_sql_query(\"SELECT city, basketballteam as team FROM nba;\", con=engine)\n",
    "nfl2.to_sql('nfl2', con = engine, index=False, chunksize=1000, if_exists = 'replace')\n",
    "nba2.to_sql('nba2', con = engine, index=False, chunksize=1000, if_exists = 'replace')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now a natural inner join between \"nfl2\" and \"nba2\" yields a dataframe with no records:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>team</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [city, team]\n",
       "Index: []"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl2 \n",
    "NATURAL INNER JOIN nba2;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The reason why there are no records is that the natural join automatically chooses both `city` and `team` to be part of the index, and records are only kept in the inner join if they match on both city and team. There are many matches for city, but no matches for both city and team.\n",
    "\n",
    "In contrast, a regular inner join still works fine:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>team</th>\n",
       "      <th>city</th>\n",
       "      <th>team</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                                 team           city  \\\n",
       "0           Miami                       Miami Dolphins          Miami   \n",
       "1          Boston                 New England Patriots         Boston   \n",
       "2        New York  {\"New York Jets\",\"New York Giants\"}       New York   \n",
       "3       Cleveland                     Cleveland Browns      Cleveland   \n",
       "4     Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}    Los Angeles   \n",
       "5          Denver                       Denver Broncos         Denver   \n",
       "6         Houston                       Houston Texans        Houston   \n",
       "7    Indianapolis                   Indianapolis Colts   Indianapolis   \n",
       "8    Philadelphia                  Philadelphia Eagles   Philadelphia   \n",
       "9          Dallas                       Dallas Cowboys         Dallas   \n",
       "10     Washington                     Washington Skins     Washington   \n",
       "11        Atlanta                      Atlanta Falcons        Atlanta   \n",
       "12      Charlotte                    Carolina Panthers      Charlotte   \n",
       "13    New Orleans                   New Orleans Saints    New Orleans   \n",
       "14  San Francisco                  San Francisco 49ers  San Francisco   \n",
       "15        Phoenix                    Arizona Cardinals        Phoenix   \n",
       "16        Chicago                        Chicago Bears        Chicago   \n",
       "17    Minneapolis                    Minnesota Vikings    Minneapolis   \n",
       "18        Detroit                        Detroit Lions        Detroit   \n",
       "\n",
       "                               team  \n",
       "0                        Miami Heat  \n",
       "1                    Boston Celtics  \n",
       "2                   New York Knicks  \n",
       "3               Cleveland Cavaliers  \n",
       "4   {\"L.A. Lakers\",\"L.A. Clippers\"}  \n",
       "5                    Denver Nuggets  \n",
       "6                   Houston Rockets  \n",
       "7                    Indiana Pacers  \n",
       "8                Philadelphia 76ers  \n",
       "9                  Dallas Mavericks  \n",
       "10               Washington Wizards  \n",
       "11                    Atlanta Hawks  \n",
       "12                Charlotte Hornets  \n",
       "13             New Orleans Pelicans  \n",
       "14            Golden State Warriors  \n",
       "15                     Phoenix Suns  \n",
       "16                    Chicago Bulls  \n",
       "17           Minnesota Timberwolves  \n",
       "18                  Detroit Pistons  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl2 f\n",
    "INNER JOIN nba2 b\n",
    "    ON f.city = b.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To safely use natural joins, first make certain that the indices you intend to match on have the same name, and then make sure that no other columns in the two data tables share a name."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Cross Joins\n",
    "A **round robin** is a method of organizing a competitive tournament. In a round robin, every team or participant plays every other team or participant once. A cross join, also called a Cartesian product, is a round robin for matching values of the index in one data table to values of the index in the other data table. Every value of the index in the first data table is matched once to every distinct value of the index in the second data table. Cross joins are memory-intensive: if the first data table has $M$ rows and the second data table has $N$ rows, the cross join output is a data table with $M\\times N$ rows. In general cross joins are not good ways to combine data entities, and they fail to match strictly like units. But cross joins are useful for constructing data that contain all possible pairings, if that's what a situation calls for.\n",
    "\n",
    "The syntax for generating a cross join is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "CROSS JOIN table2;\n",
    "```\n",
    "There is no `ON` statement in this query because it is not needed to match each row in `table1` to every row in `table2`. For the teams database, the cross join generates the following output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>Brooklyn Nets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>865</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>866</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>867</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Salt Lake City</td>\n",
       "      <td>Utah Jazz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>868</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Memphis</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>869</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>870 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        city   footballteam            city          basketballteam\n",
       "0    Buffalo  Buffalo Bills          Boston          Boston Celtics\n",
       "1    Buffalo  Buffalo Bills        New York         New York Knicks\n",
       "2    Buffalo  Buffalo Bills    Philadelphia      Philadelphia 76ers\n",
       "3    Buffalo  Buffalo Bills        Brooklyn           Brooklyn Nets\n",
       "4    Buffalo  Buffalo Bills         Toronto         Toronto Raptors\n",
       "..       ...            ...             ...                     ...\n",
       "865  Detroit  Detroit Lions     Minneapolis  Minnesota Timberwolves\n",
       "866  Detroit  Detroit Lions          Denver          Denver Nuggets\n",
       "867  Detroit  Detroit Lions  Salt Lake City               Utah Jazz\n",
       "868  Detroit  Detroit Lions         Memphis       Memphis Grizzlies\n",
       "869  Detroit  Detroit Lions     New Orleans    New Orleans Pelicans\n",
       "\n",
       "[870 rows x 4 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl\n",
    "CROSS JOIN nba;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple Joins in One Query\n",
    "All of the examples above show a single join between two data tables, but many situations will require you to join multiple tables. it is possible to join many tables in one SQL query. The syntax to perform an inner join between two tables, then an inner join between the result and a third table is\n",
    "```\n",
    "SELECT * FROM table1\n",
    "INNER JOIN table2\n",
    "    ON table1.index_name = table2.index_name\n",
    "INNER JOIN table 3\n",
    "    ON table1.index_name = table3.index_name;\n",
    "```\n",
    "To demonstrate how multiple joins can work, I add a third table to the teams database that contains all of the Major League Baseball teams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "mlb_dict = {'city': ['New York', 'Boston', 'Toronto', 'Baltimore', 'Tampa Bay',\n",
    "                     'Cleveland', 'Chicago', 'Kansas City', 'Minneapolis', 'Detroit',\n",
    "                     'Houston', 'Anaheim', 'Dallas', 'Seattle', 'Oakland',\n",
    "                     'Philadelphia', 'Miami', 'Washington', 'Atlanta', 'Cincinnati',\n",
    "                     'Milwaukee', 'St. Louis', 'Pittsburgh', 'Los Angeles', 'San Francisco',\n",
    "                     'San Diego', 'Denver', 'Phoenix'],\n",
    "           'baseballteam': [['New York Mets', 'New York Yankees'], 'Boston Red Sox', 'Toronto Blue Jays',\n",
    "                            'Baltimore Orioles', 'Tampa Bay Rays', 'Cleveland Indians', \n",
    "                             ['Chicago White Sox', 'Chicago Cubs'], 'Kansas City Royals', 'Minnesota Twins',\n",
    "                            'Detriot Tigers', 'Houston Astros', 'Anaheim Angels', 'Texas Rangers', \n",
    "                            'Seattle Mariners', 'Oakland Athletics', 'Philadelphia Phillies',\n",
    "                            'Miami Marlins', 'Washington Nationals', 'Atlanta Braves', 'Cincinnati Reds',\n",
    "                            'Milwaukee Brewers', 'St. Louis Cardinals', 'Pittsburgh Pirates', 'Los Angeles Dodgers',\n",
    "                            'San Francisco Giants', 'San Diego Padres', 'Colorado Rockies', 'Arizona Diamondbacks']}\n",
    "mlb_df = pd.DataFrame(mlb_dict)\n",
    "mlb_df.to_sql('mlb', con = engine, index=False, chunksize=1000, if_exists = 'replace')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can first inner join the NFL and NBA data tables to keep only the cities with both an NFL and an NBA team, then we can inner join the result with the MLB data to keep only the cities with teams in all three sports:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>baseballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Braves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Red Sox</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>{\"Chicago White Sox\",\"Chicago Cubs\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Indians</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Texas Rangers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Colorado Rockies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detriot Tigers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Astros</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>Los Angeles Dodgers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Marlins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Twins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Mets\",\"New York Yankees\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Phillies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Diamondbacks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco Giants</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Nationals</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam           city  \\\n",
       "0         Atlanta                      Atlanta Falcons        Atlanta   \n",
       "1          Boston                 New England Patriots         Boston   \n",
       "2         Chicago                        Chicago Bears        Chicago   \n",
       "3       Cleveland                     Cleveland Browns      Cleveland   \n",
       "4          Dallas                       Dallas Cowboys         Dallas   \n",
       "5          Denver                       Denver Broncos         Denver   \n",
       "6         Detroit                        Detroit Lions        Detroit   \n",
       "7         Houston                       Houston Texans        Houston   \n",
       "8     Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}    Los Angeles   \n",
       "9           Miami                       Miami Dolphins          Miami   \n",
       "10    Minneapolis                    Minnesota Vikings    Minneapolis   \n",
       "11       New York  {\"New York Jets\",\"New York Giants\"}       New York   \n",
       "12   Philadelphia                  Philadelphia Eagles   Philadelphia   \n",
       "13        Phoenix                    Arizona Cardinals        Phoenix   \n",
       "14  San Francisco                  San Francisco 49ers  San Francisco   \n",
       "15     Washington                     Washington Skins     Washington   \n",
       "\n",
       "                     basketballteam           city  \\\n",
       "0                     Atlanta Hawks        Atlanta   \n",
       "1                    Boston Celtics         Boston   \n",
       "2                     Chicago Bulls        Chicago   \n",
       "3               Cleveland Cavaliers      Cleveland   \n",
       "4                  Dallas Mavericks         Dallas   \n",
       "5                    Denver Nuggets         Denver   \n",
       "6                   Detroit Pistons        Detroit   \n",
       "7                   Houston Rockets        Houston   \n",
       "8   {\"L.A. Lakers\",\"L.A. Clippers\"}    Los Angeles   \n",
       "9                        Miami Heat          Miami   \n",
       "10           Minnesota Timberwolves    Minneapolis   \n",
       "11                  New York Knicks       New York   \n",
       "12               Philadelphia 76ers   Philadelphia   \n",
       "13                     Phoenix Suns        Phoenix   \n",
       "14            Golden State Warriors  San Francisco   \n",
       "15               Washington Wizards     Washington   \n",
       "\n",
       "                            baseballteam  \n",
       "0                         Atlanta Braves  \n",
       "1                         Boston Red Sox  \n",
       "2   {\"Chicago White Sox\",\"Chicago Cubs\"}  \n",
       "3                      Cleveland Indians  \n",
       "4                          Texas Rangers  \n",
       "5                       Colorado Rockies  \n",
       "6                         Detriot Tigers  \n",
       "7                         Houston Astros  \n",
       "8                    Los Angeles Dodgers  \n",
       "9                          Miami Marlins  \n",
       "10                       Minnesota Twins  \n",
       "11  {\"New York Mets\",\"New York Yankees\"}  \n",
       "12                 Philadelphia Phillies  \n",
       "13                  Arizona Diamondbacks  \n",
       "14                  San Francisco Giants  \n",
       "15                  Washington Nationals  "
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "INNER JOIN nba b\n",
    "    ON f.city = b.city\n",
    "INNER JOIN mlb m\n",
    "    ON f.city = m.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Things get more complicated when we consider left, right, and full joins in a multiple table context. The trick is to think about the set of records that is required, to express that set in set theoretical notation, and to find the right combination of joins that matches that set theoretical statement. \n",
    "\n",
    "For example, to obtain all cities with both an NFL and NBA team, also listing the MLB team if one exists in that city, we first inner join the NFL table to the NBA table, then we left join either the NFL's or NBA's city index to the MLB's city column:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>baseballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Braves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Red Sox</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "      <td>Chicago</td>\n",
       "      <td>{\"Chicago White Sox\",\"Chicago Cubs\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Indians</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "      <td>Dallas</td>\n",
       "      <td>Texas Rangers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "      <td>Denver</td>\n",
       "      <td>Colorado Rockies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detriot Tigers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Astros</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>Los Angeles Dodgers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Marlins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Twins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Mets\",\"New York Yankees\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Phillies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Diamondbacks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco Giants</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Nationals</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam           city  \\\n",
       "0         Atlanta                      Atlanta Falcons        Atlanta   \n",
       "1          Boston                 New England Patriots         Boston   \n",
       "2       Charlotte                    Carolina Panthers      Charlotte   \n",
       "3         Chicago                        Chicago Bears        Chicago   \n",
       "4       Cleveland                     Cleveland Browns      Cleveland   \n",
       "5          Dallas                       Dallas Cowboys         Dallas   \n",
       "6          Denver                       Denver Broncos         Denver   \n",
       "7         Detroit                        Detroit Lions        Detroit   \n",
       "8         Houston                       Houston Texans        Houston   \n",
       "9    Indianapolis                   Indianapolis Colts   Indianapolis   \n",
       "10    Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}    Los Angeles   \n",
       "11          Miami                       Miami Dolphins          Miami   \n",
       "12    Minneapolis                    Minnesota Vikings    Minneapolis   \n",
       "13    New Orleans                   New Orleans Saints    New Orleans   \n",
       "14       New York  {\"New York Jets\",\"New York Giants\"}       New York   \n",
       "15   Philadelphia                  Philadelphia Eagles   Philadelphia   \n",
       "16        Phoenix                    Arizona Cardinals        Phoenix   \n",
       "17  San Francisco                  San Francisco 49ers  San Francisco   \n",
       "18     Washington                     Washington Skins     Washington   \n",
       "\n",
       "                     basketballteam           city  \\\n",
       "0                     Atlanta Hawks        Atlanta   \n",
       "1                    Boston Celtics         Boston   \n",
       "2                 Charlotte Hornets           None   \n",
       "3                     Chicago Bulls        Chicago   \n",
       "4               Cleveland Cavaliers      Cleveland   \n",
       "5                  Dallas Mavericks         Dallas   \n",
       "6                    Denver Nuggets         Denver   \n",
       "7                   Detroit Pistons        Detroit   \n",
       "8                   Houston Rockets        Houston   \n",
       "9                    Indiana Pacers           None   \n",
       "10  {\"L.A. Lakers\",\"L.A. Clippers\"}    Los Angeles   \n",
       "11                       Miami Heat          Miami   \n",
       "12           Minnesota Timberwolves    Minneapolis   \n",
       "13             New Orleans Pelicans           None   \n",
       "14                  New York Knicks       New York   \n",
       "15               Philadelphia 76ers   Philadelphia   \n",
       "16                     Phoenix Suns        Phoenix   \n",
       "17            Golden State Warriors  San Francisco   \n",
       "18               Washington Wizards     Washington   \n",
       "\n",
       "                            baseballteam  \n",
       "0                         Atlanta Braves  \n",
       "1                         Boston Red Sox  \n",
       "2                                   None  \n",
       "3   {\"Chicago White Sox\",\"Chicago Cubs\"}  \n",
       "4                      Cleveland Indians  \n",
       "5                          Texas Rangers  \n",
       "6                       Colorado Rockies  \n",
       "7                         Detriot Tigers  \n",
       "8                         Houston Astros  \n",
       "9                                   None  \n",
       "10                   Los Angeles Dodgers  \n",
       "11                         Miami Marlins  \n",
       "12                       Minnesota Twins  \n",
       "13                                  None  \n",
       "14  {\"New York Mets\",\"New York Yankees\"}  \n",
       "15                 Philadelphia Phillies  \n",
       "16                  Arizona Diamondbacks  \n",
       "17                  San Francisco Giants  \n",
       "18                  Washington Nationals  "
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM nfl f\n",
    "INNER JOIN nba b\n",
    "    ON f.city = b.city\n",
    "LEFT JOIN mlb m\n",
    "    ON f.city = m.city;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To narrow the records to teams with a baseball team and a basketball team, but no football team, first we inner join the MLB and NBA data tables, then perform an anti-join with the NFL data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>baseballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Blue Jays</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Brewers</td>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Bucks</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        city       baseballteam       city   basketballteam  city footballteam\n",
       "0    Toronto  Toronto Blue Jays    Toronto  Toronto Raptors  None         None\n",
       "1  Milwaukee  Milwaukee Brewers  Milwaukee  Milwaukee Bucks  None         None"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM mlb m\n",
    "INNER JOIN nba b\n",
    "    ON m.city=b.city\n",
    "LEFT JOIN nfl f\n",
    "    ON m.city = f.city\n",
    "WHERE f.city is NULL;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Joins on More Than One Index\n",
    "Sometimes more than one column comprises the primary key for a table. The general syntax for joining two tables on more than one index adds the `AND` clause to the standard SQL join syntax:\n",
    "```\n",
    "SELECT * FROM table1\n",
    "INNER JOIN table2\n",
    "    ON table1.index1 = table2.index2\n",
    "        AND table1.anotherindex1 = table2.anotherindex2;\n",
    "```\n",
    "Suppose for example that the NBA table and MLB table also contained records for minor league teams in the NBA G-League or the MLB AAA system. Some cities have both major and minor league teams in the same sport. Washington, for example, has a major league NBA team, the Wizards, and a minor league basketball team, the Capital City Go-Gos. Suppose that both the NBA and MLB tables have a column `leaguetype` that marks each team as \"major\" or \"minor\", and that we want to match on both city and league type. The syntax to do so is\n",
    "```\n",
    "SELECT * FROM nba b\n",
    "INNER JOIN mlb m\n",
    "    ON b.city = m.city\n",
    "        AND b.leaguetype = m.leaguetype;\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Create, Update, and Delete Operations\n",
    "Once a database exists and is populated with data, most changes to the data will be small and incremental. We might add a few records, edit a couple, or delete one or two. There are straightforward SQL commands for creating, updating, and deleting records. To issue these queries, however, we cannot use the `pd.read_sql_query()` function as this function is only for read operations. Instead, we can use the `.execute()` method as applied to either the cursor for the database we are working with, or the `sqlalchemy` engine. Specific examples are shown below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating New Records\n",
    "An existing database has a schema, an overarching organizational blueprint for the database, that describes the different tables in the database, and within each table what the columns are and what kinds of data can be input into the columns. Creating new data generally works within an established schema. That means we enter new datapoints into existing columns, matching the data type that must exist in those columns.\n",
    "\n",
    "The SQL syntax to create new data is\n",
    "```\n",
    "INSERT INTO table (column1, column2, ...)\n",
    "    VALUES (value1, value2, ...);\n",
    "```\n",
    "This syntax requires us to specify the key elements of the schema that identify a location in the database: the table and the columns. The values need to be listed in the same order as the columns, and character values need to be enclosed in single quotes.\n",
    "\n",
    "To add a new observation to the NBA table (bring back the Sonics!) we can type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlalchemy.engine.result.ResultProxy at 0x1153b5390>"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "INSERT INTO nba (city, basketballteam)\n",
    "    VALUES ('Seattle', 'Seattle Supersonics');\n",
    "\"\"\"\n",
    "engine.execute(myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here the `engine` variable is the `sqlaclchemy` connection we previously established for the teams database. We can use the `execute()` method to pass SQL queries to the database, just as we can with a cursor. Now, when we look at the data, we see the Seattle Supersonics included along with all the other NBA teams:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>Brooklyn Nets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Bucks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Orlando</td>\n",
       "      <td>Orlando Magic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Hornets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Portland</td>\n",
       "      <td>Portland Trailblazers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Sacramento</td>\n",
       "      <td>Sacramento Kings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>San Antonio</td>\n",
       "      <td>San Antonio Spurs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Oklahoma City</td>\n",
       "      <td>Oklahoma City Thunder</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Salt Lake City</td>\n",
       "      <td>Utah Jazz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Memphis</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Supersonics</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              city                   basketballteam\n",
       "0           Boston                   Boston Celtics\n",
       "1         New York                  New York Knicks\n",
       "2     Philadelphia               Philadelphia 76ers\n",
       "3         Brooklyn                    Brooklyn Nets\n",
       "4          Toronto                  Toronto Raptors\n",
       "5        Cleveland              Cleveland Cavaliers\n",
       "6          Chicago                    Chicago Bulls\n",
       "7          Detroit                  Detroit Pistons\n",
       "8        Milwaukee                  Milwaukee Bucks\n",
       "9     Indianapolis                   Indiana Pacers\n",
       "10         Atlanta                    Atlanta Hawks\n",
       "11      Washington               Washington Wizards\n",
       "12         Orlando                    Orlando Magic\n",
       "13           Miami                       Miami Heat\n",
       "14       Charlotte                Charlotte Hornets\n",
       "15     Los Angeles  {\"L.A. Lakers\",\"L.A. Clippers\"}\n",
       "16   San Francisco            Golden State Warriors\n",
       "17        Portland            Portland Trailblazers\n",
       "18      Sacramento                 Sacramento Kings\n",
       "19         Phoenix                     Phoenix Suns\n",
       "20     San Antonio                San Antonio Spurs\n",
       "21          Dallas                 Dallas Mavericks\n",
       "22         Houston                  Houston Rockets\n",
       "23   Oklahoma City            Oklahoma City Thunder\n",
       "24     Minneapolis           Minnesota Timberwolves\n",
       "25          Denver                   Denver Nuggets\n",
       "26  Salt Lake City                        Utah Jazz\n",
       "27         Memphis                Memphis Grizzlies\n",
       "28     New Orleans             New Orleans Pelicans\n",
       "29         Seattle              Seattle Supersonics"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_sql_query(\"SELECT * FROM nba\", con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Editing Existing Records\n",
    "Instead of creating a new record, there are situations in which we want to edit an existing record. To revise a record, we use the following SQL syntax:\n",
    "```\n",
    "UPDATE table\n",
    "    SET column2 = newvalue\n",
    "    WHERE logicalcondition;\n",
    "```\n",
    "In this case, `SET` specifies the change we want to make to a particular column. But we don't want to change *all* of the values of the column, so we use `WHERE` to specify a logical condition to identify the rows we want to change. A logical condition is a statement that is true on some rows and false on others, and the data update happens only on the rows for which the condition is true. \n",
    "\n",
    "Suppose we want to change the name of the Charlotte Hornets back to the Charlotte Bobcats (sorry, Charlotte). We can use the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlalchemy.engine.result.ResultProxy at 0x115432c88>"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "UPDATE nba\n",
    "    SET basketballteam = 'Charlotte Bobcats'\n",
    "    WHERE city = 'Charlotte';\n",
    "\"\"\"\n",
    "engine.execute(myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here the query says to update values in the NBA table by changing `basketballteam` to Charlotte Bobcats, but only when `city` is Charlotte. This update now appears in the NBA data:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>basketballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Boston</td>\n",
       "      <td>Boston Celtics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>New York</td>\n",
       "      <td>New York Knicks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia 76ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Brooklyn</td>\n",
       "      <td>Brooklyn Nets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Toronto</td>\n",
       "      <td>Toronto Raptors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Cavaliers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bulls</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Pistons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Milwaukee</td>\n",
       "      <td>Milwaukee Bucks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indiana Pacers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Hawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Wizards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Orlando</td>\n",
       "      <td>Orlando Magic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Heat</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Lakers\",\"L.A. Clippers\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>Golden State Warriors</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Portland</td>\n",
       "      <td>Portland Trailblazers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Sacramento</td>\n",
       "      <td>Sacramento Kings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Phoenix Suns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>San Antonio</td>\n",
       "      <td>San Antonio Spurs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Mavericks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Rockets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Oklahoma City</td>\n",
       "      <td>Oklahoma City Thunder</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Timberwolves</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Nuggets</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Salt Lake City</td>\n",
       "      <td>Utah Jazz</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Memphis</td>\n",
       "      <td>Memphis Grizzlies</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Pelicans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Supersonics</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Charlotte Bobcats</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "              city                   basketballteam\n",
       "0           Boston                   Boston Celtics\n",
       "1         New York                  New York Knicks\n",
       "2     Philadelphia               Philadelphia 76ers\n",
       "3         Brooklyn                    Brooklyn Nets\n",
       "4          Toronto                  Toronto Raptors\n",
       "5        Cleveland              Cleveland Cavaliers\n",
       "6          Chicago                    Chicago Bulls\n",
       "7          Detroit                  Detroit Pistons\n",
       "8        Milwaukee                  Milwaukee Bucks\n",
       "9     Indianapolis                   Indiana Pacers\n",
       "10         Atlanta                    Atlanta Hawks\n",
       "11      Washington               Washington Wizards\n",
       "12         Orlando                    Orlando Magic\n",
       "13           Miami                       Miami Heat\n",
       "14     Los Angeles  {\"L.A. Lakers\",\"L.A. Clippers\"}\n",
       "15   San Francisco            Golden State Warriors\n",
       "16        Portland            Portland Trailblazers\n",
       "17      Sacramento                 Sacramento Kings\n",
       "18         Phoenix                     Phoenix Suns\n",
       "19     San Antonio                San Antonio Spurs\n",
       "20          Dallas                 Dallas Mavericks\n",
       "21         Houston                  Houston Rockets\n",
       "22   Oklahoma City            Oklahoma City Thunder\n",
       "23     Minneapolis           Minnesota Timberwolves\n",
       "24          Denver                   Denver Nuggets\n",
       "25  Salt Lake City                        Utah Jazz\n",
       "26         Memphis                Memphis Grizzlies\n",
       "27     New Orleans             New Orleans Pelicans\n",
       "28         Seattle              Seattle Supersonics\n",
       "29       Charlotte                Charlotte Bobcats"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_sql_query(\"SELECT * FROM nba\", con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Deleting Records\n",
    "Sometimes you might need to delete records from a database. These situations should be rare. If a record is no longer relevant for a particular use, it is always better to leave the record in the database and use another column to denote new information that can be used to filter records later. If there are mistakes in data entry, it's better to edit existing records than to delete those records outright. If you must delete a record, the syntax to do so is\n",
    "```\n",
    "DELETE FROM table WHERE logicalcondition;\n",
    "```\n",
    "First specify the table, then the logical condition that identifies the rows you intend to delete. \n",
    "\n",
    "In the teams database, suppose we want to delete the Baltimore Ravens (go Browns!) from the NFL table. The code to do that is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<sqlalchemy.engine.result.ResultProxy at 0x115445a58>"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "DELETE FROM nfl WHERE city = 'Baltimore'; \n",
    "\"\"\"\n",
    "engine.execute(myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case, `city = 'Baltimore'` identifies the rows we want to delete in the NFL table. The NFL data now no longer contains a row for the Ravens:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>city</th>\n",
       "      <th>footballteam</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Buffalo</td>\n",
       "      <td>Buffalo Bills</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Miami</td>\n",
       "      <td>Miami Dolphins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Boston</td>\n",
       "      <td>New England Patriots</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>New York</td>\n",
       "      <td>{\"New York Jets\",\"New York Giants\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Cleveland</td>\n",
       "      <td>Cleveland Browns</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Cincinnati</td>\n",
       "      <td>Cincinnati Bengals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Pittsburgh</td>\n",
       "      <td>Pittsburgh Steelers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Kansas City</td>\n",
       "      <td>Kansas City Chiefs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Las Vegas</td>\n",
       "      <td>Las Vegas Raiders</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Los Angeles</td>\n",
       "      <td>{\"L.A. Chargers\",\"L.A. Rams\"}</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Denver</td>\n",
       "      <td>Denver Broncos</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Nashville</td>\n",
       "      <td>Tennessee Titans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Jacksonville</td>\n",
       "      <td>Jacksonville Jaguars</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Houston</td>\n",
       "      <td>Houston Texans</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Indianapolis</td>\n",
       "      <td>Indianapolis Colts</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Philadelphia</td>\n",
       "      <td>Philadelphia Eagles</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>Dallas</td>\n",
       "      <td>Dallas Cowboys</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>Washington</td>\n",
       "      <td>Washington Skins</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Atlanta</td>\n",
       "      <td>Atlanta Falcons</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Charlotte</td>\n",
       "      <td>Carolina Panthers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Tampa Bay</td>\n",
       "      <td>Tampa Bay Buccaneers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>New Orleans</td>\n",
       "      <td>New Orleans Saints</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>San Francisco</td>\n",
       "      <td>San Francisco 49ers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Phoenix</td>\n",
       "      <td>Arizona Cardinals</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Seattle</td>\n",
       "      <td>Seattle Seahawks</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Chicago</td>\n",
       "      <td>Chicago Bears</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Green Bay</td>\n",
       "      <td>Green Bay Packers</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Minneapolis</td>\n",
       "      <td>Minnesota Vikings</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Detroit</td>\n",
       "      <td>Detroit Lions</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "             city                         footballteam\n",
       "0         Buffalo                        Buffalo Bills\n",
       "1           Miami                       Miami Dolphins\n",
       "2          Boston                 New England Patriots\n",
       "3        New York  {\"New York Jets\",\"New York Giants\"}\n",
       "4       Cleveland                     Cleveland Browns\n",
       "5      Cincinnati                   Cincinnati Bengals\n",
       "6      Pittsburgh                  Pittsburgh Steelers\n",
       "7     Kansas City                   Kansas City Chiefs\n",
       "8       Las Vegas                    Las Vegas Raiders\n",
       "9     Los Angeles        {\"L.A. Chargers\",\"L.A. Rams\"}\n",
       "10         Denver                       Denver Broncos\n",
       "11      Nashville                     Tennessee Titans\n",
       "12   Jacksonville                 Jacksonville Jaguars\n",
       "13        Houston                       Houston Texans\n",
       "14   Indianapolis                   Indianapolis Colts\n",
       "15   Philadelphia                  Philadelphia Eagles\n",
       "16         Dallas                       Dallas Cowboys\n",
       "17     Washington                     Washington Skins\n",
       "18        Atlanta                      Atlanta Falcons\n",
       "19      Charlotte                    Carolina Panthers\n",
       "20      Tampa Bay                 Tampa Bay Buccaneers\n",
       "21    New Orleans                   New Orleans Saints\n",
       "22  San Francisco                  San Francisco 49ers\n",
       "23        Phoenix                    Arizona Cardinals\n",
       "24        Seattle                     Seattle Seahawks\n",
       "25        Chicago                        Chicago Bears\n",
       "26      Green Bay                    Green Bay Packers\n",
       "27    Minneapolis                    Minnesota Vikings\n",
       "28        Detroit                        Detroit Lions"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_sql_query(\"SELECT * FROM nfl\", con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cleaning and Manipulating Data with SQL Read Operations\n",
    "After using joins to combine data tables in the database, the data needs to be manipulated to make the data more convenient to use. That might involve narrowing down the data to a specific subset of interest, performing calculations on the data to generate new features, and changing the appearance of the data. In \"[Tidy Data](https://www.jstatsoft.org/article/view/v059i10)\", Hadley Wickham defines four essential \"verbs\" of data manipulation:\n",
    "\n",
    "> * Filter: subsetting or removing observations based on some condition.\n",
    "> * Transform: adding or modifying variables. These modifications can involve either a single variable (e.g., log-transformation), or multiple variables (e.g., computing density from weight and volume).\n",
    "> * Aggregate: collapsing multiple values into a single value (e.g., by summing or taking means).\n",
    "> * Sort: changing the order of observations (p. 13).\n",
    "\n",
    "In addition it may be necessary to pull only a selection of the columns into the output, or to change the names of the columns to more readable and useful ones. These operations can be performed within SQL read commands by using the `WHERE` clause for filtering, mathematical operators to transform columns, the `GROUP BY` syntax for aggregation, the `ORDER BY`, `ASC`, or `DESC` clauses for sorting, and the `AS` keyword for renaming columns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Example: Wine Reviews\n",
    "To illustrate how to issue queries to read data while manipulating and cleaning the data, we will use the PostgreSQL version of the wine review database that we created in module 6. If you want to follow along with these example, follow the instructions in the \"Using PostgreSQL\" subsection of module 6 to get a local wine database running on your system.\n",
    "\n",
    "For read operations, we can use the `pd.read_sql_query()` function. For that, we first have to use `sqlalchemy` to set up an engine that connects `pandas` to the database: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "engine = create_engine(\"postgresql+psycopg2://{user}:{pw}@localhost/{db}\"\n",
    "                       .format(user=\"jk8sd\", pw=pgpassword, db=\"winedb\"))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The logical ER diagram for the wine reviews database is\n",
    "\n",
    "<img src=\"https://github.com/jkropko/DS-6001/raw/master/localimages/wine_er4.png\" width=\"400\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting Columns\n",
    "`SELECT` and `FROM` are the primary SQL verbs for reading data. In many of the examples up to this points, we've issued queries like\n",
    "```\n",
    "SELECT * FROM table;\n",
    "```\n",
    "that pull all of the rows and all of the columns from a single data table. The `*` character is called a [wildcard character](https://en.wikipedia.org/wiki/Wildcard_character). When typed by itself, the wildcard captures all of the columns in a table. But sometimes we are interested in only a selection of the columns. In that case, we replace the wildcard with the columns we want to include in the output. The following syntax includes three columns from a specified data table:\n",
    "```\n",
    "SELECT col1, col2, col3 FROM table;\n",
    "```\n",
    "Suppose that I want to know the title, variety, price, points, country, and reviewer for all of the wines in the data. Title, variety, price, and points are all in the reviews table, country is in the locations table, and the reviewer (`taster_name`) is in the tasters table. To produce the data I need to join these three tables while also using `SELECT` to identify only the rows I am interested in. Inner joins are appropriate because every wine in the data has both a location and a reviewer. \n",
    "\n",
    "The best way to select columns across multiple tables is to use aliasing, the same way we did for joins. In this case, if we alias the reviews table as `r`, locations as `l`, and tasters as `t`, we can use these same aliases to inform SQL where to find each column in the `SELECT` syntax. \n",
    "\n",
    "The code to return this dataframe is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>points</th>\n",
       "      <th>country</th>\n",
       "      <th>taster_name</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Olivier Leflaive 2006 Les Pucelles Premier Cru...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>93</td>\n",
       "      <td>France</td>\n",
       "      <td>Roger Voss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>The Foundry 2004 Syrah (Coastal Region)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>35.0</td>\n",
       "      <td>87</td>\n",
       "      <td>South Africa</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Guilbaud Frères 2007 Le Soleil Nantais  (Musca...</td>\n",
       "      <td>Melon</td>\n",
       "      <td>11.0</td>\n",
       "      <td>87</td>\n",
       "      <td>France</td>\n",
       "      <td>Roger Voss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Domaine du Clos du Fief 2007 Cuvée Tradition  ...</td>\n",
       "      <td>Gamay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>86</td>\n",
       "      <td>France</td>\n",
       "      <td>Roger Voss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Domaine Philippe Delesvaux 2005 La Montée de l...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>NaN</td>\n",
       "      <td>86</td>\n",
       "      <td>France</td>\n",
       "      <td>Roger Voss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Sheridan Vineyard 2005 Reserve Cabernet Sauvig...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>75.0</td>\n",
       "      <td>94</td>\n",
       "      <td>US</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Woodward Canyon 2006 Old Vines Dedication Seri...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>84.0</td>\n",
       "      <td>94</td>\n",
       "      <td>US</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Chanson Père et Fils 2005 Champs Gains Premier...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>115.0</td>\n",
       "      <td>93</td>\n",
       "      <td>France</td>\n",
       "      <td>Roger Voss</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Mark Ryan 2006 Chardonnay (Columbia Valley (WA))</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>93</td>\n",
       "      <td>US</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Joseph Drouhin 2007  Grands-Echezeaux</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>285.0</td>\n",
       "      <td>94</td>\n",
       "      <td>France</td>\n",
       "      <td>Roger Voss</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title             variety  \\\n",
       "0       Olivier Leflaive 2006 Les Pucelles Premier Cru...          Chardonnay   \n",
       "1                 The Foundry 2004 Syrah (Coastal Region)               Syrah   \n",
       "2       Guilbaud Frères 2007 Le Soleil Nantais  (Musca...               Melon   \n",
       "3       Domaine du Clos du Fief 2007 Cuvée Tradition  ...               Gamay   \n",
       "4       Domaine Philippe Delesvaux 2005 La Montée de l...  Cabernet Sauvignon   \n",
       "...                                                   ...                 ...   \n",
       "103722  Sheridan Vineyard 2005 Reserve Cabernet Sauvig...  Cabernet Sauvignon   \n",
       "103723  Woodward Canyon 2006 Old Vines Dedication Seri...  Cabernet Sauvignon   \n",
       "103724  Chanson Père et Fils 2005 Champs Gains Premier...          Chardonnay   \n",
       "103725   Mark Ryan 2006 Chardonnay (Columbia Valley (WA))          Chardonnay   \n",
       "103726              Joseph Drouhin 2007  Grands-Echezeaux          Pinot Noir   \n",
       "\n",
       "        price  points       country      taster_name  \n",
       "0         NaN      93        France       Roger Voss  \n",
       "1        35.0      87  South Africa  Susan Kostrzewa  \n",
       "2        11.0      87        France       Roger Voss  \n",
       "3         NaN      86        France       Roger Voss  \n",
       "4         NaN      86        France       Roger Voss  \n",
       "...       ...     ...           ...              ...  \n",
       "103722   75.0      94            US     Paul Gregutt  \n",
       "103723   84.0      94            US     Paul Gregutt  \n",
       "103724  115.0      93        France       Roger Voss  \n",
       "103725    NaN      93            US     Paul Gregutt  \n",
       "103726  285.0      94        France       Roger Voss  \n",
       "\n",
       "[103727 rows x 6 columns]"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery=\"\"\"\n",
    "SELECT r.title, r.variety, r.price, r.points, l.country, t.taster_name FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "INNER JOIN tasters t\n",
    "    ON r.taster_id = t.taster_id;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Logical Statements\n",
    "Most programming languages have the capacity to evaluate a statement as being either true or false, or true for some values and false for others. A logical statement uses **logical operators** that define how values should be compared. In SQL, logical statements are used in conjunction with the `WHERE` statement to select the rows to include in the output.\n",
    "\n",
    "For SQL, logical statements either compare a column to another column, or compare a column to one or more reference values. The following logical operators are available:\n",
    "\n",
    "* `=` - is equal to?\n",
    "* `<` - is less than?\n",
    "* `>` - is greater than?\n",
    "* `<=` - is less than or equal to?\n",
    "* `>=` - is greater than or equal to?\n",
    "* `<>` - is not equal to?\n",
    "* `BETWEEN a AND b` - true if a value exists within the range from a to b, including a and b \n",
    "* `IN ('element1','element2','element3')` - true if a value is one of the elements in the given set\n",
    "* `NOT` - true if the rest of the logical statement is false, false if the rest of the logical statement is true\n",
    "* `AND` - links separate logical statements together such that the overall statement is true only when all of the linked statements are true\n",
    "* `OR` - links separate logical statements together such that the overall statement is true when any of the linked statements are true\n",
    "* `LIKE pattern` - true if the string value matches the given pattern:\n",
    "    * `LIKE '%%text'` captures all rows in which a given column ends with 'text'\n",
    "    * `LIKE 'text%%'` captures all rows in which a given column begins with 'text'\n",
    "    * `LIKE '%%text%%'` captures all rows in which a given column contains 'text' somewhere in its string value\n",
    "* `()` - parts of the logical statement that are contained within parentheses are evaluated first\n",
    "\n",
    "I will show examples of how to use these logical statements for filtering rows, in the next section."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Filtering Rows\n",
    "Suppose we wanted to know the title, the variety, and the price of the French wines that Roger Voss scored as 100. It's a simple semantic sentence, but it connects to a more complicated set of SQL functions. First consider all of the columns we need to use to process the sentence:\n",
    "\n",
    "* title, from the reviews table\n",
    "* variety, from the reviews table\n",
    "* price, from the reviews table\n",
    "* French, a value of country, from the locations table\n",
    "* Roger Voss, a value of taster name, from the tasters table,\n",
    "* and 100, a value of points, from the reviews table.\n",
    "\n",
    "Because we need to use data from the reviews, locations, and tasters tables, we need to inner join reviews, locations, and tasters. \n",
    "\n",
    "But then on top of this join, we need to restrict both the columns and rows. We only want title, variety, and price in the final data, so we use `SELECT` to keep only these columns. \n",
    "\n",
    "To restrict the rows, we use `WHERE` along with a logical condition. This logical condition has a few parts: we want wines in which `country='France'`, `taster_name='Roger Voss'`, and `points=100`. All three conditions need to be true for us to want to keep the row, so we connect the three statements with `AND`.\n",
    "\n",
    "The SQL query that returns the title, the variety, and the price of the French wines that Roger Voss scored as 100 is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Krug 2002 Brut  (Champagne)</td>\n",
       "      <td>Champagne Blend</td>\n",
       "      <td>259.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Château Léoville Barton 2010  Saint-Julien</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>150.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Louis Roederer 2008 Cristal Vintage Brut  (Cha...</td>\n",
       "      <td>Champagne Blend</td>\n",
       "      <td>250.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Salon 2006 Le Mesnil Blanc de Blancs Brut Char...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>617.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Château Lafite Rothschild 2010  Pauillac</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>1500.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Château Cheval Blanc 2010  Saint-Émilion</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>1500.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Château Léoville Las Cases 2010  Saint-Julien</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>359.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>Château Haut-Brion 2014  Pessac-Léognan</td>\n",
       "      <td>Bordeaux-style White Blend</td>\n",
       "      <td>848.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               title  \\\n",
       "0                        Krug 2002 Brut  (Champagne)   \n",
       "1         Château Léoville Barton 2010  Saint-Julien   \n",
       "2  Louis Roederer 2008 Cristal Vintage Brut  (Cha...   \n",
       "3  Salon 2006 Le Mesnil Blanc de Blancs Brut Char...   \n",
       "4           Château Lafite Rothschild 2010  Pauillac   \n",
       "5           Château Cheval Blanc 2010  Saint-Émilion   \n",
       "6      Château Léoville Las Cases 2010  Saint-Julien   \n",
       "7            Château Haut-Brion 2014  Pessac-Léognan   \n",
       "\n",
       "                      variety   price  \n",
       "0             Champagne Blend   259.0  \n",
       "1    Bordeaux-style Red Blend   150.0  \n",
       "2             Champagne Blend   250.0  \n",
       "3                  Chardonnay   617.0  \n",
       "4    Bordeaux-style Red Blend  1500.0  \n",
       "5    Bordeaux-style Red Blend  1500.0  \n",
       "6    Bordeaux-style Red Blend   359.0  \n",
       "7  Bordeaux-style White Blend   848.0  "
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.variety, r.price FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "INNER JOIN tasters t\n",
    "    ON r.taster_id = t.taster_id\n",
    "WHERE l.country='France' AND t.taster_name='Roger Voss' AND r.points=100;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "I like my wine local or low-cost. So as another example, suppose we want the title, variety, price, points, country, and providence for all of the wines with scores of 90 or more that either cost between 5 and 10 dollars or are from Virginia. In this case, we need to join the reviews and locations data together, and write a logical statement that matches these specific conditions. The logical condition is\n",
    "```\n",
    "points >= 90 AND (price BETWEEN 5 AND 10 OR province = 'Virginia')\n",
    "```\n",
    "The entire SQL query is"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>points</th>\n",
       "      <th>country</th>\n",
       "      <th>province</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Château Vircoulon 2016  Bordeaux Blanc</td>\n",
       "      <td>Bordeaux-style White Blend</td>\n",
       "      <td>10.0</td>\n",
       "      <td>90</td>\n",
       "      <td>France</td>\n",
       "      <td>Bordeaux</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Mano A Mano 2011 Tempranillo (Vino de la Tierr...</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>9.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Spain</td>\n",
       "      <td>Central Spain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Aveleda 2014 Quinta da Aveleda Estate Bottled ...</td>\n",
       "      <td>Portuguese White</td>\n",
       "      <td>9.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Vinho Verde</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chateau Ste. Michelle 2011 Riesling (Columbia ...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>9.0</td>\n",
       "      <td>91</td>\n",
       "      <td>US</td>\n",
       "      <td>Washington</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Quinta do Portal 2007 Mural Reserva Red (Douro)</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>10.0</td>\n",
       "      <td>91</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Douro</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>76</th>\n",
       "      <td>Casaleiro 2012 Reserva Touriga Nacional-Castel...</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>9.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Tejo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>77</th>\n",
       "      <td>Aveleda 2015 Quinta da Aveleda White (Vinho Ve...</td>\n",
       "      <td>Portuguese White</td>\n",
       "      <td>9.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Vinho Verde</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78</th>\n",
       "      <td>Cookies &amp; Cream 2010 Merlot (California)</td>\n",
       "      <td>Merlot</td>\n",
       "      <td>10.0</td>\n",
       "      <td>90</td>\n",
       "      <td>US</td>\n",
       "      <td>California</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>79</th>\n",
       "      <td>Lovingston 2012 Josie's Knoll Merlot (Monticello)</td>\n",
       "      <td>Merlot</td>\n",
       "      <td>20.0</td>\n",
       "      <td>91</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>80</th>\n",
       "      <td>Aveleda 2016 Quinta da Aveleda White (Vinho Ve...</td>\n",
       "      <td>Portuguese White</td>\n",
       "      <td>10.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Vinho Verde</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>81 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                title  \\\n",
       "0              Château Vircoulon 2016  Bordeaux Blanc   \n",
       "1   Mano A Mano 2011 Tempranillo (Vino de la Tierr...   \n",
       "2   Aveleda 2014 Quinta da Aveleda Estate Bottled ...   \n",
       "3   Chateau Ste. Michelle 2011 Riesling (Columbia ...   \n",
       "4     Quinta do Portal 2007 Mural Reserva Red (Douro)   \n",
       "..                                                ...   \n",
       "76  Casaleiro 2012 Reserva Touriga Nacional-Castel...   \n",
       "77  Aveleda 2015 Quinta da Aveleda White (Vinho Ve...   \n",
       "78           Cookies & Cream 2010 Merlot (California)   \n",
       "79  Lovingston 2012 Josie's Knoll Merlot (Monticello)   \n",
       "80  Aveleda 2016 Quinta da Aveleda White (Vinho Ve...   \n",
       "\n",
       "                       variety  price  points   country       province  \n",
       "0   Bordeaux-style White Blend   10.0      90    France       Bordeaux  \n",
       "1                  Tempranillo    9.0      90     Spain  Central Spain  \n",
       "2             Portuguese White    9.0      90  Portugal    Vinho Verde  \n",
       "3                     Riesling    9.0      91        US     Washington  \n",
       "4               Portuguese Red   10.0      91  Portugal          Douro  \n",
       "..                         ...    ...     ...       ...            ...  \n",
       "76              Portuguese Red    9.0      90  Portugal           Tejo  \n",
       "77            Portuguese White    9.0      90  Portugal    Vinho Verde  \n",
       "78                      Merlot   10.0      90        US     California  \n",
       "79                      Merlot   20.0      91        US       Virginia  \n",
       "80            Portuguese White   10.0      90  Portugal    Vinho Verde  \n",
       "\n",
       "[81 rows x 6 columns]"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.variety, r.price, r.points, l.country, l.province FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "WHERE r.points >= 90 AND (r.price BETWEEN 5 AND 10 OR l.province = 'Virginia');\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The parentheses in this last query are needed to ensure that the DBMS evaluates the `OR` statement first. Without the parentheses,\n",
    "```\n",
    "points >= 90 AND price BETWEEN 5 AND 10 OR province = 'Virginia'\n",
    "```\n",
    "the DBMS evaluates the first two conditions first, then considers the third, so the statement is equivalent to\n",
    "```\n",
    "(points >= 90 AND price BETWEEN 5 AND 10) OR province = 'Virginia'\n",
    "```\n",
    "and it returns data with all wines that have scores of at least 90 and prices between 5 and 10 dollars, along with all wines from Virginia whether or not those wines have scores of at least 90:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>points</th>\n",
       "      <th>country</th>\n",
       "      <th>province</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Veramar 2016 JB Winemaker Series Cabernet Fran...</td>\n",
       "      <td>Cabernet Franc</td>\n",
       "      <td>34.0</td>\n",
       "      <td>86</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Château Vircoulon 2016  Bordeaux Blanc</td>\n",
       "      <td>Bordeaux-style White Blend</td>\n",
       "      <td>10.0</td>\n",
       "      <td>90</td>\n",
       "      <td>France</td>\n",
       "      <td>Bordeaux</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Mano A Mano 2011 Tempranillo (Vino de la Tierr...</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>9.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Spain</td>\n",
       "      <td>Central Spain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Trump 2014 Rosé (Monticello)</td>\n",
       "      <td>Rosé</td>\n",
       "      <td>14.0</td>\n",
       "      <td>86</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>The Boneyard 2014 Chardonnay (Virginia)</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>15.0</td>\n",
       "      <td>86</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>444</th>\n",
       "      <td>Annefield Vineyards 2009 Cabernet Franc (Virgi...</td>\n",
       "      <td>Cabernet Franc</td>\n",
       "      <td>29.0</td>\n",
       "      <td>88</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>445</th>\n",
       "      <td>Lovingston 2012 Josie's Knoll Merlot (Monticello)</td>\n",
       "      <td>Merlot</td>\n",
       "      <td>20.0</td>\n",
       "      <td>91</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>446</th>\n",
       "      <td>Aveleda 2016 Quinta da Aveleda White (Vinho Ve...</td>\n",
       "      <td>Portuguese White</td>\n",
       "      <td>10.0</td>\n",
       "      <td>90</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>Vinho Verde</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>447</th>\n",
       "      <td>Tarara 2013 Cabernet Franc (Virginia)</td>\n",
       "      <td>Cabernet Franc</td>\n",
       "      <td>25.0</td>\n",
       "      <td>85</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>448</th>\n",
       "      <td>Paradise Springs 2014 Nana's Rosé (Virginia)</td>\n",
       "      <td>Rosé</td>\n",
       "      <td>22.0</td>\n",
       "      <td>86</td>\n",
       "      <td>US</td>\n",
       "      <td>Virginia</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>449 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                 title  \\\n",
       "0    Veramar 2016 JB Winemaker Series Cabernet Fran...   \n",
       "1               Château Vircoulon 2016  Bordeaux Blanc   \n",
       "2    Mano A Mano 2011 Tempranillo (Vino de la Tierr...   \n",
       "3                         Trump 2014 Rosé (Monticello)   \n",
       "4              The Boneyard 2014 Chardonnay (Virginia)   \n",
       "..                                                 ...   \n",
       "444  Annefield Vineyards 2009 Cabernet Franc (Virgi...   \n",
       "445  Lovingston 2012 Josie's Knoll Merlot (Monticello)   \n",
       "446  Aveleda 2016 Quinta da Aveleda White (Vinho Ve...   \n",
       "447              Tarara 2013 Cabernet Franc (Virginia)   \n",
       "448       Paradise Springs 2014 Nana's Rosé (Virginia)   \n",
       "\n",
       "                        variety  price  points   country       province  \n",
       "0                Cabernet Franc   34.0      86        US       Virginia  \n",
       "1    Bordeaux-style White Blend   10.0      90    France       Bordeaux  \n",
       "2                   Tempranillo    9.0      90     Spain  Central Spain  \n",
       "3                          Rosé   14.0      86        US       Virginia  \n",
       "4                    Chardonnay   15.0      86        US       Virginia  \n",
       "..                          ...    ...     ...       ...            ...  \n",
       "444              Cabernet Franc   29.0      88        US       Virginia  \n",
       "445                      Merlot   20.0      91        US       Virginia  \n",
       "446            Portuguese White   10.0      90  Portugal    Vinho Verde  \n",
       "447              Cabernet Franc   25.0      85        US       Virginia  \n",
       "448                        Rosé   22.0      86        US       Virginia  \n",
       "\n",
       "[449 rows x 6 columns]"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.variety, r.price, r.points, l.country, l.province FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "WHERE r.points >= 90 AND r.price BETWEEN 5 AND 10 OR l.province = 'Virginia';\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose I'm open to many wines, but I have a thing against wines from the U.S., and I don't like Pinot Noir, Pinot Gris, or Chardonnay. I want to query the wines database to return data on the title, variety, country, and price of all of these wines except for the American ones and the ones I dislike. The SQL query requires joining the reviews and locations tables, and using negation in the logical statement with the `<>` and `NOT` operators, like this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>country</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>The Foundry 2004 Syrah (Coastal Region)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>35.0</td>\n",
       "      <td>South Africa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Guilbaud Frères 2007 Le Soleil Nantais  (Musca...</td>\n",
       "      <td>Melon</td>\n",
       "      <td>11.0</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Domaine du Clos du Fief 2007 Cuvée Tradition  ...</td>\n",
       "      <td>Gamay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Domaine Philippe Delesvaux 2005 La Montée de l...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>NaN</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Georges Duboeuf 2007  Beaujolais-Villages</td>\n",
       "      <td>Gamay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57429</th>\n",
       "      <td>Indomita NV Rosé Sparkling (Casablanca Valley)</td>\n",
       "      <td>Sparkling Blend</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Chile</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57430</th>\n",
       "      <td>Intipalka 2013 Valle del Sol Tannat (Ica)</td>\n",
       "      <td>Tannat</td>\n",
       "      <td>14.0</td>\n",
       "      <td>Peru</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57431</th>\n",
       "      <td>Lobster Reef 2014 Sauvignon Blanc (Marlborough)</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>12.0</td>\n",
       "      <td>New Zealand</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57432</th>\n",
       "      <td>Millaman 2014 Estate Reserve Sauvignon Blanc (...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>10.0</td>\n",
       "      <td>Chile</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>57433</th>\n",
       "      <td>Royal Tokaji 1999 Mézes Mály Aszú 6 Puttonyos ...</td>\n",
       "      <td>Tokaji</td>\n",
       "      <td>175.0</td>\n",
       "      <td>Hungary</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>57434 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   title             variety  \\\n",
       "0                The Foundry 2004 Syrah (Coastal Region)               Syrah   \n",
       "1      Guilbaud Frères 2007 Le Soleil Nantais  (Musca...               Melon   \n",
       "2      Domaine du Clos du Fief 2007 Cuvée Tradition  ...               Gamay   \n",
       "3      Domaine Philippe Delesvaux 2005 La Montée de l...  Cabernet Sauvignon   \n",
       "4              Georges Duboeuf 2007  Beaujolais-Villages               Gamay   \n",
       "...                                                  ...                 ...   \n",
       "57429     Indomita NV Rosé Sparkling (Casablanca Valley)     Sparkling Blend   \n",
       "57430          Intipalka 2013 Valle del Sol Tannat (Ica)              Tannat   \n",
       "57431    Lobster Reef 2014 Sauvignon Blanc (Marlborough)     Sauvignon Blanc   \n",
       "57432  Millaman 2014 Estate Reserve Sauvignon Blanc (...     Sauvignon Blanc   \n",
       "57433  Royal Tokaji 1999 Mézes Mály Aszú 6 Puttonyos ...              Tokaji   \n",
       "\n",
       "       price       country  \n",
       "0       35.0  South Africa  \n",
       "1       11.0        France  \n",
       "2        NaN        France  \n",
       "3        NaN        France  \n",
       "4        NaN        France  \n",
       "...      ...           ...  \n",
       "57429   18.0         Chile  \n",
       "57430   14.0          Peru  \n",
       "57431   12.0   New Zealand  \n",
       "57432   10.0         Chile  \n",
       "57433  175.0       Hungary  \n",
       "\n",
       "[57434 rows x 4 columns]"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.variety, r.price, l.country FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "WHERE country <> 'US' AND variety NOT IN ('Pinot Noir', 'Pinot Gris', 'Chardonnay');\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If we want all of the columns from the reviews table for wines whose descriptions contain the words \"smoke\" and \"chocolate\" - taking case sensitivity into account by converting the descriptions to all lower case in the `WHERE` clause so that \"chocolate\" and \"Chocolate\" are both matched - the following query returns those wines:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>wine_id</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>taster_id</th>\n",
       "      <th>winery_id</th>\n",
       "      <th>location_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>6792</td>\n",
       "      <td>Guardian Peak 2006 Shiraz (Western Cape)</td>\n",
       "      <td>Shiraz</td>\n",
       "      <td>A gorgeous nose of plums, chocolate and red fr...</td>\n",
       "      <td>89</td>\n",
       "      <td>15.0</td>\n",
       "      <td>16</td>\n",
       "      <td>7363</td>\n",
       "      <td>1141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>7812</td>\n",
       "      <td>Hightower 2006 Cabernet Sauvignon (Columbia Va...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>Sourced largely from Red Mountain fruit, this ...</td>\n",
       "      <td>87</td>\n",
       "      <td>35.0</td>\n",
       "      <td>3</td>\n",
       "      <td>7629</td>\n",
       "      <td>1474</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>7866</td>\n",
       "      <td>Viña Cobos 2012 Bramare Marchiori Vineyard Mal...</td>\n",
       "      <td>Malbec</td>\n",
       "      <td>Toasty woodsmoke aromas are matched by wild be...</td>\n",
       "      <td>94</td>\n",
       "      <td>90.0</td>\n",
       "      <td>5</td>\n",
       "      <td>13962</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>8110</td>\n",
       "      <td>Mendel 2013 Unus Red (Mendoza)</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Rich aromas of raisin, cassis and blackberry a...</td>\n",
       "      <td>93</td>\n",
       "      <td>50.0</td>\n",
       "      <td>5</td>\n",
       "      <td>9746</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>9262</td>\n",
       "      <td>Freakshow 2014 Cabernet Sauvignon (Lodi)</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>Rich, ripe and oaky, this full-bodied wine has...</td>\n",
       "      <td>91</td>\n",
       "      <td>20.0</td>\n",
       "      <td>10</td>\n",
       "      <td>6893</td>\n",
       "      <td>1304</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>260</th>\n",
       "      <td>4862</td>\n",
       "      <td>Hearst Ranch 2013 Lone Tree Cabernet Franc (Pa...</td>\n",
       "      <td>Cabernet Franc</td>\n",
       "      <td>Made in a thick, oaky style, this shows burned...</td>\n",
       "      <td>87</td>\n",
       "      <td>35.0</td>\n",
       "      <td>8</td>\n",
       "      <td>7498</td>\n",
       "      <td>1337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>261</th>\n",
       "      <td>4690</td>\n",
       "      <td>Pear Valley 2013 Distraction Red (Paso Robles)</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>The signature bottling from this winery, this ...</td>\n",
       "      <td>91</td>\n",
       "      <td>35.0</td>\n",
       "      <td>8</td>\n",
       "      <td>10762</td>\n",
       "      <td>1337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>262</th>\n",
       "      <td>4794</td>\n",
       "      <td>Estate Constantin Gofas 2008 Agiorgitiko (Nemea)</td>\n",
       "      <td>Agiorgitiko</td>\n",
       "      <td>This wine has the plucky character typical of ...</td>\n",
       "      <td>85</td>\n",
       "      <td>18.0</td>\n",
       "      <td>16</td>\n",
       "      <td>6366</td>\n",
       "      <td>668</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>263</th>\n",
       "      <td>6574</td>\n",
       "      <td>Fielding Hills 2006 RiverBend Vineyard Syrah (...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>Bold and forward, this estate-grown Syrah fair...</td>\n",
       "      <td>94</td>\n",
       "      <td>40.0</td>\n",
       "      <td>3</td>\n",
       "      <td>6619</td>\n",
       "      <td>1483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>264</th>\n",
       "      <td>5020</td>\n",
       "      <td>Corliss Estates 2007 Cabernet Sauvignon (Colum...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>Concentrated and wonderfully aromatic, this ag...</td>\n",
       "      <td>94</td>\n",
       "      <td>75.0</td>\n",
       "      <td>3</td>\n",
       "      <td>4677</td>\n",
       "      <td>1474</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>265 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     wine_id                                              title  \\\n",
       "0       6792           Guardian Peak 2006 Shiraz (Western Cape)   \n",
       "1       7812  Hightower 2006 Cabernet Sauvignon (Columbia Va...   \n",
       "2       7866  Viña Cobos 2012 Bramare Marchiori Vineyard Mal...   \n",
       "3       8110                     Mendel 2013 Unus Red (Mendoza)   \n",
       "4       9262           Freakshow 2014 Cabernet Sauvignon (Lodi)   \n",
       "..       ...                                                ...   \n",
       "260     4862  Hearst Ranch 2013 Lone Tree Cabernet Franc (Pa...   \n",
       "261     4690     Pear Valley 2013 Distraction Red (Paso Robles)   \n",
       "262     4794   Estate Constantin Gofas 2008 Agiorgitiko (Nemea)   \n",
       "263     6574  Fielding Hills 2006 RiverBend Vineyard Syrah (...   \n",
       "264     5020  Corliss Estates 2007 Cabernet Sauvignon (Colum...   \n",
       "\n",
       "                      variety  \\\n",
       "0                      Shiraz   \n",
       "1          Cabernet Sauvignon   \n",
       "2                      Malbec   \n",
       "3    Bordeaux-style Red Blend   \n",
       "4          Cabernet Sauvignon   \n",
       "..                        ...   \n",
       "260            Cabernet Franc   \n",
       "261  Bordeaux-style Red Blend   \n",
       "262               Agiorgitiko   \n",
       "263                     Syrah   \n",
       "264        Cabernet Sauvignon   \n",
       "\n",
       "                                           description  points  price  \\\n",
       "0    A gorgeous nose of plums, chocolate and red fr...      89   15.0   \n",
       "1    Sourced largely from Red Mountain fruit, this ...      87   35.0   \n",
       "2    Toasty woodsmoke aromas are matched by wild be...      94   90.0   \n",
       "3    Rich aromas of raisin, cassis and blackberry a...      93   50.0   \n",
       "4    Rich, ripe and oaky, this full-bodied wine has...      91   20.0   \n",
       "..                                                 ...     ...    ...   \n",
       "260  Made in a thick, oaky style, this shows burned...      87   35.0   \n",
       "261  The signature bottling from this winery, this ...      91   35.0   \n",
       "262  This wine has the plucky character typical of ...      85   18.0   \n",
       "263  Bold and forward, this estate-grown Syrah fair...      94   40.0   \n",
       "264  Concentrated and wonderfully aromatic, this ag...      94   75.0   \n",
       "\n",
       "     taster_id  winery_id  location_id  \n",
       "0           16       7363         1141  \n",
       "1            3       7629         1474  \n",
       "2            5      13962            4  \n",
       "3            5       9746            7  \n",
       "4           10       6893         1304  \n",
       "..         ...        ...          ...  \n",
       "260          8       7498         1337  \n",
       "261          8      10762         1337  \n",
       "262         16       6366          668  \n",
       "263          3       6619         1483  \n",
       "264          3       4677         1474  \n",
       "\n",
       "[265 rows x 9 columns]"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM reviews \n",
    "WHERE LOWER(description) LIKE '%%smoke%%' AND LOWER(description) LIKE '%%chocolate%%';\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are situations in which we want to display only some of the records that match a particular query. For that, we can use the `LIMIT` and `OFFSET` clauses. `LIMIT` sets the number of records to extract, and `OFFSET` set the starting row. For example, adding\n",
    "```\n",
    "LIMIT 10\n",
    "```\n",
    "to a query instructs the DBMS to extract only the first 10 rows of the output data. Adding\n",
    "```\n",
    "LIMIT 10 OFFSET 5\n",
    "```\n",
    "tells the DBMS to extract 10 rows, after first skipping the first 5 rows: so these clauses together return rows 6 through 15. To see the 4th through 7th rows from the previous query to the wine reviews database, we can type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>wine_id</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>taster_id</th>\n",
       "      <th>winery_id</th>\n",
       "      <th>location_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>8110</td>\n",
       "      <td>Mendel 2013 Unus Red (Mendoza)</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Rich aromas of raisin, cassis and blackberry a...</td>\n",
       "      <td>93</td>\n",
       "      <td>50.0</td>\n",
       "      <td>5</td>\n",
       "      <td>9746</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>9262</td>\n",
       "      <td>Freakshow 2014 Cabernet Sauvignon (Lodi)</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>Rich, ripe and oaky, this full-bodied wine has...</td>\n",
       "      <td>91</td>\n",
       "      <td>20.0</td>\n",
       "      <td>10</td>\n",
       "      <td>6893</td>\n",
       "      <td>1304</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9746</td>\n",
       "      <td>Carmel 2013 Admon Vineyard Cabernet Sauvignon ...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>This wine has offers aromas of dark plum and c...</td>\n",
       "      <td>89</td>\n",
       "      <td>35.0</td>\n",
       "      <td>14</td>\n",
       "      <td>1925</td>\n",
       "      <td>694</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>10326</td>\n",
       "      <td>De Martino 2009 Alto de Piedras Single Vineyar...</td>\n",
       "      <td>Carmenère</td>\n",
       "      <td>A big, earthy type of wine with a ton of ripen...</td>\n",
       "      <td>90</td>\n",
       "      <td>45.0</td>\n",
       "      <td>5</td>\n",
       "      <td>4973</td>\n",
       "      <td>182</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   wine_id                                              title  \\\n",
       "0     8110                     Mendel 2013 Unus Red (Mendoza)   \n",
       "1     9262           Freakshow 2014 Cabernet Sauvignon (Lodi)   \n",
       "2     9746  Carmel 2013 Admon Vineyard Cabernet Sauvignon ...   \n",
       "3    10326  De Martino 2009 Alto de Piedras Single Vineyar...   \n",
       "\n",
       "                    variety  \\\n",
       "0  Bordeaux-style Red Blend   \n",
       "1        Cabernet Sauvignon   \n",
       "2        Cabernet Sauvignon   \n",
       "3                 Carmenère   \n",
       "\n",
       "                                         description  points  price  \\\n",
       "0  Rich aromas of raisin, cassis and blackberry a...      93   50.0   \n",
       "1  Rich, ripe and oaky, this full-bodied wine has...      91   20.0   \n",
       "2  This wine has offers aromas of dark plum and c...      89   35.0   \n",
       "3  A big, earthy type of wine with a ton of ripen...      90   45.0   \n",
       "\n",
       "   taster_id  winery_id  location_id  \n",
       "0          5       9746            7  \n",
       "1         10       6893         1304  \n",
       "2         14       1925          694  \n",
       "3          5       4973          182  "
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT * FROM reviews \n",
    "WHERE LOWER(description) LIKE '%%smoke%%' AND LOWER(description) LIKE '%%chocolate%%'\n",
    "LIMIT 4 OFFSET 3;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sorting Data\n",
    "Sorting data refers to rearranging the rows of a dataframe. Sorting is a cosmetic thing to do to data because the order of the rows should not change the meaning of the data both in terms of storage (rearranging the rows should NOT change the meaning of each row), or for most analytical models (rearranging rows won't change the parameter estimates from a linear regression, for example). But sorting is a way to visualize important characteristics about the data and to quickly see important records with maximum and minimum values of key features.\n",
    "\n",
    "To sort the output data, use the `ORDER BY` syntax within an SQL query. The general syntax for `ORDER BY` is\n",
    "```\n",
    "ORDER BY column1, column2, column3\n",
    "```\n",
    "Writing more than one column is optional. If more than one column is entered, then the second column is used to *break ties* between rows that have the same value of the first column. If a third column is entered, it's used to break ties between rows that have the same value for both the first and second columns. In addition, each column can be sorted in ascending or descending order by typing either `ASC` (this is the default, so typing `ASC` is optional, but useful for making the SQL code more readable) or `DESC` immediately after the column name.\n",
    "\n",
    "For example, what are the top rated wines from Virginia? And of these top rated wines, which ones are cheapest? To find out, we issue a query that joins the reviews and locations tables, filters the data to just wines from Virginia, narrows down the columns to just title, points, and price, and sorts first by points, then by price. We sort by points in descending order so the best wines appear first, and we sort by price in ascending order so that the cheapest wines appear first. The syntax for this query is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>King Family 2015 Orange Viognier (Monticello)</td>\n",
       "      <td>92</td>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Lovingston 2012 Josie's Knoll Merlot (Monticello)</td>\n",
       "      <td>91</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Lovingston 2015 Josie's Knoll Rotunda Red (Mon...</td>\n",
       "      <td>90</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Barboursville Vineyards 2015 Reserve Cabernet ...</td>\n",
       "      <td>90</td>\n",
       "      <td>25.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>King Family 2012 Meritage (Monticello)</td>\n",
       "      <td>90</td>\n",
       "      <td>31.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>374</th>\n",
       "      <td>Narmada 2013 Reserve Cabernet Franc (Virginia)</td>\n",
       "      <td>82</td>\n",
       "      <td>34.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>375</th>\n",
       "      <td>Veramar 2009 Chardonnay (Virginia)</td>\n",
       "      <td>81</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>376</th>\n",
       "      <td>Bogati 2013 Black Label Club Fumé Blanc Sauvig...</td>\n",
       "      <td>81</td>\n",
       "      <td>26.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>377</th>\n",
       "      <td>Three Fox 2014 Calabrese Pinot Grigio (Middleb...</td>\n",
       "      <td>81</td>\n",
       "      <td>28.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>378</th>\n",
       "      <td>Winery at La Grange 2012 Cabernet Sauvignon (V...</td>\n",
       "      <td>81</td>\n",
       "      <td>43.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>379 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                 title  points  price\n",
       "0        King Family 2015 Orange Viognier (Monticello)      92   35.0\n",
       "1    Lovingston 2012 Josie's Knoll Merlot (Monticello)      91   20.0\n",
       "2    Lovingston 2015 Josie's Knoll Rotunda Red (Mon...      90   20.0\n",
       "3    Barboursville Vineyards 2015 Reserve Cabernet ...      90   25.0\n",
       "4               King Family 2012 Meritage (Monticello)      90   31.0\n",
       "..                                                 ...     ...    ...\n",
       "374     Narmada 2013 Reserve Cabernet Franc (Virginia)      82   34.0\n",
       "375                 Veramar 2009 Chardonnay (Virginia)      81   18.0\n",
       "376  Bogati 2013 Black Label Club Fumé Blanc Sauvig...      81   26.0\n",
       "377  Three Fox 2014 Calabrese Pinot Grigio (Middleb...      81   28.0\n",
       "378  Winery at La Grange 2012 Cabernet Sauvignon (V...      81   43.0\n",
       "\n",
       "[379 rows x 3 columns]"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.points, r.price FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "WHERE province = 'Virginia'\n",
    "ORDER BY points DESC, price ASC;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Renaming Columns and Transforming Data Values\n",
    "Sometimes it is useful to rename the columns in the output data within a query. To rename a column, use the `AS` syntax while referencing the columns in `SELECT`. For example, we can load the title, variety, and points columns from the reviews table, but we can rename these columns name, type, and score respectively:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>name</th>\n",
       "      <th>type</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                     name             type  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        score  \n",
       "0          86  \n",
       "1          86  \n",
       "2          86  \n",
       "3          86  \n",
       "4          86  \n",
       "...       ...  \n",
       "103722     86  \n",
       "103723     86  \n",
       "103724     86  \n",
       "103725     86  \n",
       "103726     86  \n",
       "\n",
       "[103727 rows x 3 columns]"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title AS name, variety AS type, points AS score FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In other situations, we might want to transform a column arithmetically or in another way. SQL supports the standard arithmetic operators: `+` for addition, `-` for subtraction, `*` for multiplication, and `/` for division. SQL also supports the modulo operator `%` to return the remainder after division (`16 % 5` equals 1, for example, because 16 divided by 5 yields a remainder of 1). SQL also allows the following [arithmetic functions](https://www.w3schools.com/sql/func_sqlserver_pi.asp):\n",
    "\n",
    "* `EXP(a)` - raises `a` the argument to the power of $e = 2.718...$\n",
    "* `POWER(a,b)` - raises `a` to the power of `b`\n",
    "* `LOG(a)` - takes the natural (base $e$) logarithm of `a`\n",
    "* `LOG10(a)` - takes the common (base 10) logarithm of `a`\n",
    "* `SQRT(a)` - takes the square root of `a`\n",
    "* `ABS(a)` - takes the absolute value of `a`\n",
    "* `CEILING(a)` - rounds values of `a` up to the next whole number\n",
    "* `FLOOR(a)` - rounds values of `a` down to a whole number\n",
    "* `ROUND(a, k)` - rounds values of `a` up or down to the nearest number with `k` decimals\n",
    "* `SIGN(a)` - returns 1 if `a` is positive, -1 if `a` is negative, and 0 if `a` is 0\n",
    "\n",
    "In addition, if you need them, there are many trigonometric functions built into standard SQL.\n",
    "\n",
    "When using a function that operates on a column, it is important to use `AS` to name the new column, as SQL has no way to choose a logical name automatically for constructed columns and uses `?column?` be default.\n",
    "\n",
    "For example, we can convert the price of each wine from dollars to Euros by multiplying the price by the .91 USD to Euro exchange rate. We keep the original price in the query but rename it `price_dollars`, and we name the converted price `price_euros`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price_dollars</th>\n",
       "      <th>price_euros</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>12.0</td>\n",
       "      <td>10.92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>24.0</td>\n",
       "      <td>21.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>18.0</td>\n",
       "      <td>16.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>22.0</td>\n",
       "      <td>20.02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>15.0</td>\n",
       "      <td>13.65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>18.0</td>\n",
       "      <td>16.38</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>24.0</td>\n",
       "      <td>21.84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>19.0</td>\n",
       "      <td>17.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>24.0</td>\n",
       "      <td>21.84</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title          variety  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        price_dollars  price_euros  \n",
       "0                12.0        10.92  \n",
       "1                24.0        21.84  \n",
       "2                 NaN          NaN  \n",
       "3                18.0        16.38  \n",
       "4                22.0        20.02  \n",
       "...               ...          ...  \n",
       "103722           15.0        13.65  \n",
       "103723           18.0        16.38  \n",
       "103724           24.0        21.84  \n",
       "103725           19.0        17.29  \n",
       "103726           24.0        21.84  \n",
       "\n",
       "[103727 rows x 4 columns]"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title, variety, price AS price_dollars, .91*price AS price_euros FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "For no reason other than to demonstrate the use of the various mathematical functions, we can put many transformations of price in one dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "      <th>price_exp</th>\n",
       "      <th>price_natlog</th>\n",
       "      <th>price_commonlog</th>\n",
       "      <th>price_loground</th>\n",
       "      <th>price_sqrt</th>\n",
       "      <th>price_squared</th>\n",
       "      <th>price_cubed</th>\n",
       "      <th>price_morethan50</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>12.0</td>\n",
       "      <td>1.012072</td>\n",
       "      <td>1.079181</td>\n",
       "      <td>1.079181</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.464102</td>\n",
       "      <td>144.0</td>\n",
       "      <td>1728.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>24.0</td>\n",
       "      <td>1.024290</td>\n",
       "      <td>1.380211</td>\n",
       "      <td>1.380211</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.898979</td>\n",
       "      <td>576.0</td>\n",
       "      <td>13824.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>18.0</td>\n",
       "      <td>1.018163</td>\n",
       "      <td>1.255273</td>\n",
       "      <td>1.255273</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.242641</td>\n",
       "      <td>324.0</td>\n",
       "      <td>5832.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>22.0</td>\n",
       "      <td>1.022244</td>\n",
       "      <td>1.342423</td>\n",
       "      <td>1.342423</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.690416</td>\n",
       "      <td>484.0</td>\n",
       "      <td>10648.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>15.0</td>\n",
       "      <td>1.015113</td>\n",
       "      <td>1.176091</td>\n",
       "      <td>1.176091</td>\n",
       "      <td>1.0</td>\n",
       "      <td>3.872983</td>\n",
       "      <td>225.0</td>\n",
       "      <td>3375.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>18.0</td>\n",
       "      <td>1.018163</td>\n",
       "      <td>1.255273</td>\n",
       "      <td>1.255273</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.242641</td>\n",
       "      <td>324.0</td>\n",
       "      <td>5832.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>24.0</td>\n",
       "      <td>1.024290</td>\n",
       "      <td>1.380211</td>\n",
       "      <td>1.380211</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.898979</td>\n",
       "      <td>576.0</td>\n",
       "      <td>13824.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>19.0</td>\n",
       "      <td>1.019182</td>\n",
       "      <td>1.278754</td>\n",
       "      <td>1.278754</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.358899</td>\n",
       "      <td>361.0</td>\n",
       "      <td>6859.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>24.0</td>\n",
       "      <td>1.024290</td>\n",
       "      <td>1.380211</td>\n",
       "      <td>1.380211</td>\n",
       "      <td>1.0</td>\n",
       "      <td>4.898979</td>\n",
       "      <td>576.0</td>\n",
       "      <td>13824.0</td>\n",
       "      <td>-1.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 9 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "        price  price_exp  price_natlog  price_commonlog  price_loground  \\\n",
       "0        12.0   1.012072      1.079181         1.079181             1.0   \n",
       "1        24.0   1.024290      1.380211         1.380211             1.0   \n",
       "2         NaN        NaN           NaN              NaN             NaN   \n",
       "3        18.0   1.018163      1.255273         1.255273             1.0   \n",
       "4        22.0   1.022244      1.342423         1.342423             1.0   \n",
       "...       ...        ...           ...              ...             ...   \n",
       "103722   15.0   1.015113      1.176091         1.176091             1.0   \n",
       "103723   18.0   1.018163      1.255273         1.255273             1.0   \n",
       "103724   24.0   1.024290      1.380211         1.380211             1.0   \n",
       "103725   19.0   1.019182      1.278754         1.278754             1.0   \n",
       "103726   24.0   1.024290      1.380211         1.380211             1.0   \n",
       "\n",
       "        price_sqrt  price_squared  price_cubed  price_morethan50  \n",
       "0         3.464102          144.0       1728.0              -1.0  \n",
       "1         4.898979          576.0      13824.0              -1.0  \n",
       "2              NaN            NaN          NaN               NaN  \n",
       "3         4.242641          324.0       5832.0              -1.0  \n",
       "4         4.690416          484.0      10648.0              -1.0  \n",
       "...            ...            ...          ...               ...  \n",
       "103722    3.872983          225.0       3375.0              -1.0  \n",
       "103723    4.242641          324.0       5832.0              -1.0  \n",
       "103724    4.898979          576.0      13824.0              -1.0  \n",
       "103725    4.358899          361.0       6859.0              -1.0  \n",
       "103726    4.898979          576.0      13824.0              -1.0  \n",
       "\n",
       "[103727 rows x 9 columns]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT price,\n",
    "    EXP(price/1000) as price_exp,\n",
    "    LOG(price) as price_natlog,\n",
    "    LOG10(price) as price_commonlog,\n",
    "    ROUND(LOG(price)) as price_loground,\n",
    "    SQRT(price) as price_sqrt,\n",
    "    POWER(price, 2) as price_squared,\n",
    "    POWER(price, 3) as price_cubed,\n",
    "    SIGN(price - 50) as price_morethan50\n",
    "FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Another useful operation for transforming columns in a read query is `CASE`, which maps numeric values to categories. The syntax that uses `CASE` within `SELECT` is\n",
    "```\n",
    "SELECT CASE\n",
    "    WHEN logicalstatement1 THEN value1\n",
    "    WHEN logicalstatement2 THEN value2\n",
    "    WHEN logicalstatement3 THEN value3\n",
    "    ELSE value4\n",
    "    END AS name\n",
    "```\n",
    "This code evaluates each logical statement, and fills in the datapoint with the specified value if the logical statement is true. If more than one of the logical statements is true, then the first statement/value pair entered in takes precedence. If none of the logical statements are true, then the datapoint is filled in with the value listed with `ELSE`. As before, it is important to name the new column with `AS`.\n",
    "\n",
    "For example, if we want to categorize wines as cheap when the price is under 20 dollars, moderately priced if the price is between 20 and 50 dollars, and expensive if the price is more than 50 dollars, we can type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>price_level</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>12.0</td>\n",
       "      <td>cheap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>24.0</td>\n",
       "      <td>moderately priced</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>18.0</td>\n",
       "      <td>cheap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>22.0</td>\n",
       "      <td>moderately priced</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>15.0</td>\n",
       "      <td>cheap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>18.0</td>\n",
       "      <td>cheap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>24.0</td>\n",
       "      <td>moderately priced</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>19.0</td>\n",
       "      <td>cheap</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>24.0</td>\n",
       "      <td>moderately priced</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title          variety  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        price        price_level  \n",
       "0        12.0              cheap  \n",
       "1        24.0  moderately priced  \n",
       "2         NaN               None  \n",
       "3        18.0              cheap  \n",
       "4        22.0  moderately priced  \n",
       "...       ...                ...  \n",
       "103722   15.0              cheap  \n",
       "103723   18.0              cheap  \n",
       "103724   24.0  moderately priced  \n",
       "103725   19.0              cheap  \n",
       "103726   24.0  moderately priced  \n",
       "\n",
       "[103727 rows x 4 columns]"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery=\"\"\"\n",
    "SELECT title, variety, price, CASE\n",
    "    WHEN price < 20 THEN 'cheap'\n",
    "    WHEN price BETWEEN 20 AND 50 THEN 'moderately priced'\n",
    "    WHEN price > 50 THEN 'expensive'\n",
    "    ELSE NULL\n",
    "    END AS price_level\n",
    "FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There's an important **point of caution** when using `CASE`. Unless you account for missing values explicity in your query, the missing values will be matched to the entered last in `CASE`. That will corrupt the data. It is best practice to write conditions for the full set of observed values of a column, and to end the call to `CASE` with `ELSE NULL`, so that when none of the conditions apply, the new column is also missing."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "There are also functions that apply to columns with [character](https://www.geeksforgeeks.org/sql-character-functions-examples/) values:\n",
    "\n",
    "* `LOWER(a)` - converts all characters in `a` to lowercase\n",
    "* `UPPER(a)` - converts all characters in `a` to uppercase\n",
    "* `INITCAP(a)` - converts the first letter of every word in `a` to uppercase\n",
    "* `CONCAT(a,b,c)` - appends the string `b` to the end of `a`, and `c` (if included) to the end of `b`\n",
    "* `LENGTH(a)` - reports the number of characters in the string `a`\n",
    "* `SUBSTR(a, start, length)` - restricts the string `a` to a substring, beginning at the position denoted by `start`, and including the next `length` characters \n",
    "* `TRIM(a)` - removes spaces at the beginning and end of string `a`\n",
    "* `REPLACE(a, oldtext, newtext)` - searches values of `a` for occurrences of `oldtext` and replaces them with `newtext`\n",
    "\n",
    "For example, we can replace the descriptions in the reviews table with all capitals, all lower-case letters, or capitals for the first letter of each word:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>description_upper</th>\n",
       "      <th>description_lower</th>\n",
       "      <th>description_initcap</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>12.0</td>\n",
       "      <td>IT'S PRETTY EASY PEGGING THIS FOR CHILEAN SB; ...</td>\n",
       "      <td>it's pretty easy pegging this for chilean sb; ...</td>\n",
       "      <td>It'S Pretty Easy Pegging This For Chilean Sb; ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>24.0</td>\n",
       "      <td>OPAQUE IN COLOR, WITH BLACKBERRY AND LICORICE ...</td>\n",
       "      <td>opaque in color, with blackberry and licorice ...</td>\n",
       "      <td>Opaque In Color, With Blackberry And Licorice ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ATTRACTIVE RIPE FRUITS GO WITH LIME AND TOAST ...</td>\n",
       "      <td>attractive ripe fruits go with lime and toast ...</td>\n",
       "      <td>Attractive Ripe Fruits Go With Lime And Toast ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>18.0</td>\n",
       "      <td>SMOKY AND A BIT STARK WITH WHIFFS OF STRUCK FL...</td>\n",
       "      <td>smoky and a bit stark with whiffs of struck fl...</td>\n",
       "      <td>Smoky And A Bit Stark With Whiffs Of Struck Fl...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>22.0</td>\n",
       "      <td>SOURCED FROM A HIGH-ALTITUDE VINEYARD NEAR THE...</td>\n",
       "      <td>sourced from a high-altitude vineyard near the...</td>\n",
       "      <td>Sourced From A High-Altitude Vineyard Near The...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>15.0</td>\n",
       "      <td>SWEET ON THE NOSE, WITH SCENTS OF PINK GRAPEFR...</td>\n",
       "      <td>sweet on the nose, with scents of pink grapefr...</td>\n",
       "      <td>Sweet On The Nose, With Scents Of Pink Grapefr...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>18.0</td>\n",
       "      <td>WAXY FRUIT FLAVORS OF PEACH, MELON AND BANANA,...</td>\n",
       "      <td>waxy fruit flavors of peach, melon and banana,...</td>\n",
       "      <td>Waxy Fruit Flavors Of Peach, Melon And Banana,...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>24.0</td>\n",
       "      <td>THIS ALCOHOLIC WINE (15.9%, AND YOU CAN TASTE ...</td>\n",
       "      <td>this alcoholic wine (15.9%, and you can taste ...</td>\n",
       "      <td>This Alcoholic Wine (15.9%, And You Can Taste ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>19.0</td>\n",
       "      <td>LIGHT FRUIT FLAVORS RUN FROM MELON INTO PALE S...</td>\n",
       "      <td>light fruit flavors run from melon into pale s...</td>\n",
       "      <td>Light Fruit Flavors Run From Melon Into Pale S...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>24.0</td>\n",
       "      <td>FROM HUSBAND-AND-WIFE TEAM IN CALIFORNIA REDWO...</td>\n",
       "      <td>from husband-and-wife team in california redwo...</td>\n",
       "      <td>From Husband-And-Wife Team In California Redwo...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title          variety  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        price                                  description_upper  \\\n",
       "0        12.0  IT'S PRETTY EASY PEGGING THIS FOR CHILEAN SB; ...   \n",
       "1        24.0  OPAQUE IN COLOR, WITH BLACKBERRY AND LICORICE ...   \n",
       "2         NaN  ATTRACTIVE RIPE FRUITS GO WITH LIME AND TOAST ...   \n",
       "3        18.0  SMOKY AND A BIT STARK WITH WHIFFS OF STRUCK FL...   \n",
       "4        22.0  SOURCED FROM A HIGH-ALTITUDE VINEYARD NEAR THE...   \n",
       "...       ...                                                ...   \n",
       "103722   15.0  SWEET ON THE NOSE, WITH SCENTS OF PINK GRAPEFR...   \n",
       "103723   18.0  WAXY FRUIT FLAVORS OF PEACH, MELON AND BANANA,...   \n",
       "103724   24.0  THIS ALCOHOLIC WINE (15.9%, AND YOU CAN TASTE ...   \n",
       "103725   19.0  LIGHT FRUIT FLAVORS RUN FROM MELON INTO PALE S...   \n",
       "103726   24.0  FROM HUSBAND-AND-WIFE TEAM IN CALIFORNIA REDWO...   \n",
       "\n",
       "                                        description_lower  \\\n",
       "0       it's pretty easy pegging this for chilean sb; ...   \n",
       "1       opaque in color, with blackberry and licorice ...   \n",
       "2       attractive ripe fruits go with lime and toast ...   \n",
       "3       smoky and a bit stark with whiffs of struck fl...   \n",
       "4       sourced from a high-altitude vineyard near the...   \n",
       "...                                                   ...   \n",
       "103722  sweet on the nose, with scents of pink grapefr...   \n",
       "103723  waxy fruit flavors of peach, melon and banana,...   \n",
       "103724  this alcoholic wine (15.9%, and you can taste ...   \n",
       "103725  light fruit flavors run from melon into pale s...   \n",
       "103726  from husband-and-wife team in california redwo...   \n",
       "\n",
       "                                      description_initcap  \n",
       "0       It'S Pretty Easy Pegging This For Chilean Sb; ...  \n",
       "1       Opaque In Color, With Blackberry And Licorice ...  \n",
       "2       Attractive Ripe Fruits Go With Lime And Toast ...  \n",
       "3       Smoky And A Bit Stark With Whiffs Of Struck Fl...  \n",
       "4       Sourced From A High-Altitude Vineyard Near The...  \n",
       "...                                                   ...  \n",
       "103722  Sweet On The Nose, With Scents Of Pink Grapefr...  \n",
       "103723  Waxy Fruit Flavors Of Peach, Melon And Banana,...  \n",
       "103724  This Alcoholic Wine (15.9%, And You Can Taste ...  \n",
       "103725  Light Fruit Flavors Run From Melon Into Pale S...  \n",
       "103726  From Husband-And-Wife Team In California Redwo...  \n",
       "\n",
       "[103727 rows x 6 columns]"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title, variety, price, \n",
    "    UPPER(description) as description_upper, \n",
    "    LOWER(description) as description_lower, \n",
    "    INITCAP(description) as description_initcap \n",
    "FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can use `REPLACE()` to make the writing less artful, for example, by replacing the word \"aroma\" with \"good smell\" everywhere it appears in the wine descriptions. Note that `REPLACE()` is case-sensitive, so it is a good idea to convert the values to a consistent case like lowercase first:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Opaque in color, with blackberry and licorice aromas but also a distinct streak of brambly herbs and green. Later on, pine needle and tartness enter the fray. This is a commendable modern Rioja but it does have a few issues, namely a green herbal component.'"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title, variety, price, description,\n",
    "    REPLACE(LOWER(description), 'aroma', 'good smell') as description_replace \n",
    "FROM reviews\n",
    "WHERE description LIKE '%%aroma%%';\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine).description[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'opaque in color, with blackberry and licorice good smells but also a distinct streak of brambly herbs and green. later on, pine needle and tartness enter the fray. this is a commendable modern rioja but it does have a few issues, namely a green herbal component.'"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.read_sql_query(myquery, con=engine).description_replace[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The `SUBSTR()` function can be used to extract parts of a string. The following code reduces the `description` column to substrings beginning at the 5th character and proceeding 10 characters in length:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>description</th>\n",
       "      <th>description_substr</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>12.0</td>\n",
       "      <td>It's pretty easy pegging this for Chilean SB; ...</td>\n",
       "      <td>pretty ea</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>24.0</td>\n",
       "      <td>Opaque in color, with blackberry and licorice ...</td>\n",
       "      <td>ue in colo</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Attractive ripe fruits go with lime and toast ...</td>\n",
       "      <td>active rip</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Smoky and a bit stark with whiffs of struck fl...</td>\n",
       "      <td>y and a bi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Sourced from a high-altitude vineyard near the...</td>\n",
       "      <td>ced from a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>15.0</td>\n",
       "      <td>Sweet on the nose, with scents of pink grapefr...</td>\n",
       "      <td>t on the n</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Waxy fruit flavors of peach, melon and banana,...</td>\n",
       "      <td>fruit fla</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>24.0</td>\n",
       "      <td>This alcoholic wine (15.9%, and you can taste ...</td>\n",
       "      <td>alcoholic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>19.0</td>\n",
       "      <td>Light fruit flavors run from melon into pale s...</td>\n",
       "      <td>t fruit fl</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>24.0</td>\n",
       "      <td>From husband-and-wife team in California redwo...</td>\n",
       "      <td>husband-a</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title          variety  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        price                                        description  \\\n",
       "0        12.0  It's pretty easy pegging this for Chilean SB; ...   \n",
       "1        24.0  Opaque in color, with blackberry and licorice ...   \n",
       "2         NaN  Attractive ripe fruits go with lime and toast ...   \n",
       "3        18.0  Smoky and a bit stark with whiffs of struck fl...   \n",
       "4        22.0  Sourced from a high-altitude vineyard near the...   \n",
       "...       ...                                                ...   \n",
       "103722   15.0  Sweet on the nose, with scents of pink grapefr...   \n",
       "103723   18.0  Waxy fruit flavors of peach, melon and banana,...   \n",
       "103724   24.0  This alcoholic wine (15.9%, and you can taste ...   \n",
       "103725   19.0  Light fruit flavors run from melon into pale s...   \n",
       "103726   24.0  From husband-and-wife team in California redwo...   \n",
       "\n",
       "       description_substr  \n",
       "0               pretty ea  \n",
       "1              ue in colo  \n",
       "2              active rip  \n",
       "3              y and a bi  \n",
       "4              ced from a  \n",
       "...                   ...  \n",
       "103722         t on the n  \n",
       "103723          fruit fla  \n",
       "103724          alcoholic  \n",
       "103725         t fruit fl  \n",
       "103726          husband-a  \n",
       "\n",
       "[103727 rows x 5 columns]"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title, variety, price, description,\n",
    "    SUBSTR(description, 5, 10) as description_substr \n",
    "FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the wine database, province and country are stored in separate columns in the locations table. If we wanted to put these two pieces of information together in one readable column, we can use `CONCAT()`. In this example, I type `CONCAT(l.province, ', ', l.country)` which appends three strings - the province from the locations table, a comma and a space, and the country from the locations table - and names the new column `place`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>price</th>\n",
       "      <th>place</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Casablanca Valley, Chile</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>24.0</td>\n",
       "      <td>Northern Spain, Spain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Burgundy, France</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>18.0</td>\n",
       "      <td>New York, US</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Marlborough, New Zealand</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>15.0</td>\n",
       "      <td>New York, US</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Washington, US</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>24.0</td>\n",
       "      <td>Washington, US</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>19.0</td>\n",
       "      <td>Washington, US</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>24.0</td>\n",
       "      <td>California, US</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title          variety  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        price                     place  \n",
       "0        12.0  Casablanca Valley, Chile  \n",
       "1        24.0     Northern Spain, Spain  \n",
       "2         NaN          Burgundy, France  \n",
       "3        18.0              New York, US  \n",
       "4        22.0  Marlborough, New Zealand  \n",
       "...       ...                       ...  \n",
       "103722   15.0              New York, US  \n",
       "103723   18.0            Washington, US  \n",
       "103724   24.0            Washington, US  \n",
       "103725   19.0            Washington, US  \n",
       "103726   24.0            California, US  \n",
       "\n",
       "[103727 rows x 4 columns]"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.variety, r.price, \n",
    "    CONCAT(l.province, ', ', l.country) as place \n",
    "FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "What are the shortest descriptions in the data? To find out we use the `LENGTH()` function to count the number of characters in each description, and sort these lengths in ascending order:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>description</th>\n",
       "      <th>length</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Craggy Range 2007 Kidnappers Vineyard Chardonn...</td>\n",
       "      <td>88</td>\n",
       "      <td>24.0</td>\n",
       "      <td>Imported by Kobrand.</td>\n",
       "      <td>20</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Chasing Venus 2007 Sauvignon Blanc (Marlborough)</td>\n",
       "      <td>88</td>\n",
       "      <td>16.0</td>\n",
       "      <td>Imported by JL Giguiere.</td>\n",
       "      <td>24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Philip Shaw 2007 No. 19 Sauvignon Blanc (Orange)</td>\n",
       "      <td>86</td>\n",
       "      <td>20.0</td>\n",
       "      <td>Imported by Lion Nathan USA.</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Peconic Bay Winery 2001 Riesling (North Fork o...</td>\n",
       "      <td>84</td>\n",
       "      <td>13.0</td>\n",
       "      <td>Review not available at this time.</td>\n",
       "      <td>34</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Mount Baker Vineyards 2006 Barrel Select Sangi...</td>\n",
       "      <td>82</td>\n",
       "      <td>16.0</td>\n",
       "      <td>Very light, could almost be a rosé.</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>René Muré 2015 Clos Saint Landelin Vorbourg Gr...</td>\n",
       "      <td>97</td>\n",
       "      <td>50.0</td>\n",
       "      <td>The heady aromatic scent of fresh tangerine pe...</td>\n",
       "      <td>698</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Domaine Marcel Deiss 2009 Altenberg de Berghei...</td>\n",
       "      <td>95</td>\n",
       "      <td>66.0</td>\n",
       "      <td>Lifted notes of dried pear, dried chamomile fl...</td>\n",
       "      <td>699</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>De Toren 2014 Book 17 XVII Red (Stellenbosch)</td>\n",
       "      <td>95</td>\n",
       "      <td>330.0</td>\n",
       "      <td>Only 95 cases were made of this Bordeaux-style...</td>\n",
       "      <td>723</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Domaine Ostertag 2015 Muenchberg Grand Cru Rie...</td>\n",
       "      <td>97</td>\n",
       "      <td>66.0</td>\n",
       "      <td>There is something incredibly fruity and simul...</td>\n",
       "      <td>753</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Saggi 2007 Red (Columbia Valley (WA))</td>\n",
       "      <td>91</td>\n",
       "      <td>45.0</td>\n",
       "      <td>Dark, dusty, strongly scented with barrel toas...</td>\n",
       "      <td>829</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title  points  price  \\\n",
       "0       Craggy Range 2007 Kidnappers Vineyard Chardonn...      88   24.0   \n",
       "1        Chasing Venus 2007 Sauvignon Blanc (Marlborough)      88   16.0   \n",
       "2        Philip Shaw 2007 No. 19 Sauvignon Blanc (Orange)      86   20.0   \n",
       "3       Peconic Bay Winery 2001 Riesling (North Fork o...      84   13.0   \n",
       "4       Mount Baker Vineyards 2006 Barrel Select Sangi...      82   16.0   \n",
       "...                                                   ...     ...    ...   \n",
       "103722  René Muré 2015 Clos Saint Landelin Vorbourg Gr...      97   50.0   \n",
       "103723  Domaine Marcel Deiss 2009 Altenberg de Berghei...      95   66.0   \n",
       "103724      De Toren 2014 Book 17 XVII Red (Stellenbosch)      95  330.0   \n",
       "103725  Domaine Ostertag 2015 Muenchberg Grand Cru Rie...      97   66.0   \n",
       "103726              Saggi 2007 Red (Columbia Valley (WA))      91   45.0   \n",
       "\n",
       "                                              description  length  \n",
       "0                                    Imported by Kobrand.      20  \n",
       "1                                Imported by JL Giguiere.      24  \n",
       "2                            Imported by Lion Nathan USA.      28  \n",
       "3                      Review not available at this time.      34  \n",
       "4                     Very light, could almost be a rosé.      35  \n",
       "...                                                   ...     ...  \n",
       "103722  The heady aromatic scent of fresh tangerine pe...     698  \n",
       "103723  Lifted notes of dried pear, dried chamomile fl...     699  \n",
       "103724  Only 95 cases were made of this Bordeaux-style...     723  \n",
       "103725  There is something incredibly fruity and simul...     753  \n",
       "103726  Dark, dusty, strongly scented with barrel toas...     829  \n",
       "\n",
       "[103727 rows x 5 columns]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title, points, price, description,\n",
    "    LENGTH(description) as length\n",
    "FROM reviews\n",
    "ORDER BY length ASC;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data Aggregation\n",
    "If there are columns in the data output that have repeated values, then each distinct value forms a group in the data. Data aggregation is the process of collapsing the data to one row for each group, while summarizing other columns by taking the within-group mean, sum, count, or another statistic. \n",
    "\n",
    "Aggregating data requires more attention to the ordering of the clauses in an SQL query than is necessary with other tasks. That is, certain clauses must be entered into the query in a particular order. An SQL query that aggregates data should follow this template:\n",
    "```\n",
    "SELECT aggregationfunctions FROM table\n",
    "(any joins happen here)\n",
    "(filtering rows with the WHERE clause happens here)\n",
    "GROUP BY groupingcolumns\n",
    "HAVING (a logical condition involving aggregation functions)\n",
    "(sorting with ORDER BY happens here);\n",
    "```\n",
    "Let's break down this template line by line. First,\n",
    "\n",
    "* `SELECT aggregationfunctions FROM table`\n",
    "\n",
    "Aggregation functions work like the arithmetic functions described above. The difference is that instead of working with a single value, like `SQRT()` and `POWER()` do, aggregation functions work with vectors of data and generate summary statistics. They describe how existing columns should be summarized when the data are collapsed. The aggregation functions are:\n",
    "\n",
    "* `COUNT(*)` - an overall count of the number of rows within each group\n",
    "* `COUNT(a)` - a count of the number of non-missing observations of column `a` within each group\n",
    "* `COUNT(DISTINCT a)` - a count of the number of distinct observations of column `a` within each group\n",
    "* `AVG(a)` - the mean of the values of `a` within each group\n",
    "* `SUM(a)` - the sum of the values of `a` within each group\n",
    "* `MAX(a)`- the maximum value of `a` within each group\n",
    "* `MIN(a)`- the minimum value of `a` within each group\n",
    "* `VARIANCE(a)` and `VAR_SAMP(a)` - the population and sample variances, respectively, of the values of `a` within each group\n",
    "* `STDDEV(a)` and `STDDEV_SAMP(a)` - the population and sample standard deviations, respectively, of the values of `a` within each group\n",
    "\n",
    "Additional statistics, like the median, mode, and various quantiles are not included in standard SQL but are available in extensions that are specific to a DBMS, such as the [quantile extension](https://pgxn.org/dist/quantile/) for PostgreSQL.\n",
    "\n",
    "One important point about the aggregation functions is that, with the exception of `COUNT()`, they **ignore NULL values**. Suppose that we have a data vector with values (1,3,8,NULL). Because we do not know the value of the fourth value, we cannot calculate the true sum and true mean of the values. However the `SUM()` function in SQL ignores the NULL value and reports the sum as 12, which makes a strong tacit assumption that the NULL value is equal to 0. The `AVG()` function calculates the mean from the observed values, and reports (1+3+8)/3 = 4, but this too makes a strong assumption that the NULL value is exactly 4. There are situations in which calculations from the non-NULL values are appropriate, but it is not correct to make broad claims from these summary statistics when there are missing values in the columns being summarized.\n",
    "\n",
    "The next two lines in the template are placeholders for the syntax we use to join data tables and the syntax we use to filter rows with the `WHERE` clause. There is a great deal of similarity between `WHERE` and `HAVING`, which we will discuss shortly.\n",
    "\n",
    "The fourth line in the template,\n",
    "\n",
    "* `GROUP BY groupingcolumns`\n",
    "\n",
    "is the key line for activating the aggregation functionality of SQL. `groupingcolumns` can include one or more columns. If one column is listed, the unique values of that column define the groups that will comprise the rows of the output data. If there is more than one column listed, the unique combinations of values from the columns define the groups.\n",
    "\n",
    "The fifth line in the template uses the `HAVING` clause. `HAVING` is very similar to `WHERE` in that both use logical conditions to identify a selection of the rows to include in the output data. The difference between `WHERE` and `HAVING` is that `WHERE` operates on rows in the original data prior to aggregation, and `HAVING` works on rows after aggregation has occurred. One limitation of `HAVING` is that it will not recognize new column names defined in `SELECT`, so the same aggregation functions used in `SELECT` need to be used again in the logical conditions for `HAVING`. Finally, if we want to sort, we can include the `ORDER BY` clause last.\n",
    "\n",
    "For example, let's find out which country produces wines with the highest average score. To do that, we need a query that joins reviews and locations together, includes country name, the average score across wines from that country, and for good measure, a count of the number of wines reviewed from that country. To collapse on country, we can use `GROUP BY`, and to produce the average score and the count of wines we use the `AVG()` and `COUNT()` functions. For presentation purposes, I choose to round the average score to one decimal and to sort the rows from the highest to lowest average score, so that we can immediately see which countries produce the highest-rated wines. The query is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>average_points</th>\n",
       "      <th>numberofwines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>England</td>\n",
       "      <td>91.6</td>\n",
       "      <td>74</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>India</td>\n",
       "      <td>90.2</td>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Austria</td>\n",
       "      <td>90.1</td>\n",
       "      <td>3337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Germany</td>\n",
       "      <td>89.9</td>\n",
       "      <td>2134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Canada</td>\n",
       "      <td>89.4</td>\n",
       "      <td>256</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Hungary</td>\n",
       "      <td>89.2</td>\n",
       "      <td>145</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>China</td>\n",
       "      <td>89.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>US</td>\n",
       "      <td>89.0</td>\n",
       "      <td>37730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>France</td>\n",
       "      <td>88.9</td>\n",
       "      <td>21828</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>Italy</td>\n",
       "      <td>88.8</td>\n",
       "      <td>11042</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Australia</td>\n",
       "      <td>88.8</td>\n",
       "      <td>2037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Luxembourg</td>\n",
       "      <td>88.7</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>None</td>\n",
       "      <td>88.6</td>\n",
       "      <td>63</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>Morocco</td>\n",
       "      <td>88.6</td>\n",
       "      <td>28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>Switzerland</td>\n",
       "      <td>88.6</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>Israel</td>\n",
       "      <td>88.5</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>New Zealand</td>\n",
       "      <td>88.3</td>\n",
       "      <td>1311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>South Africa</td>\n",
       "      <td>88.2</td>\n",
       "      <td>1328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>88.2</td>\n",
       "      <td>5686</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>Slovenia</td>\n",
       "      <td>88.1</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>Turkey</td>\n",
       "      <td>88.1</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>Bulgaria</td>\n",
       "      <td>87.9</td>\n",
       "      <td>141</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>Georgia</td>\n",
       "      <td>87.7</td>\n",
       "      <td>86</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>23</th>\n",
       "      <td>Lebanon</td>\n",
       "      <td>87.7</td>\n",
       "      <td>35</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>24</th>\n",
       "      <td>Armenia</td>\n",
       "      <td>87.5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25</th>\n",
       "      <td>Serbia</td>\n",
       "      <td>87.5</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>Czech Republic</td>\n",
       "      <td>87.3</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>27</th>\n",
       "      <td>Greece</td>\n",
       "      <td>87.3</td>\n",
       "      <td>466</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>28</th>\n",
       "      <td>Spain</td>\n",
       "      <td>87.3</td>\n",
       "      <td>6581</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>29</th>\n",
       "      <td>Moldova</td>\n",
       "      <td>87.2</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>30</th>\n",
       "      <td>Croatia</td>\n",
       "      <td>87.2</td>\n",
       "      <td>73</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>31</th>\n",
       "      <td>Cyprus</td>\n",
       "      <td>87.2</td>\n",
       "      <td>11</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>32</th>\n",
       "      <td>Slovakia</td>\n",
       "      <td>87.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>33</th>\n",
       "      <td>Uruguay</td>\n",
       "      <td>86.8</td>\n",
       "      <td>109</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>34</th>\n",
       "      <td>Macedonia</td>\n",
       "      <td>86.8</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>35</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>86.7</td>\n",
       "      <td>3797</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>36</th>\n",
       "      <td>Bosnia and Herzegovina</td>\n",
       "      <td>86.5</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>37</th>\n",
       "      <td>Chile</td>\n",
       "      <td>86.5</td>\n",
       "      <td>4361</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>38</th>\n",
       "      <td>Romania</td>\n",
       "      <td>86.4</td>\n",
       "      <td>120</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39</th>\n",
       "      <td>Mexico</td>\n",
       "      <td>85.3</td>\n",
       "      <td>65</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>40</th>\n",
       "      <td>Brazil</td>\n",
       "      <td>84.7</td>\n",
       "      <td>52</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41</th>\n",
       "      <td>Ukraine</td>\n",
       "      <td>84.1</td>\n",
       "      <td>14</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>42</th>\n",
       "      <td>Egypt</td>\n",
       "      <td>84.0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>43</th>\n",
       "      <td>Peru</td>\n",
       "      <td>83.6</td>\n",
       "      <td>16</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   country  average_points  numberofwines\n",
       "0                  England            91.6             74\n",
       "1                    India            90.2              9\n",
       "2                  Austria            90.1           3337\n",
       "3                  Germany            89.9           2134\n",
       "4                   Canada            89.4            256\n",
       "5                  Hungary            89.2            145\n",
       "6                    China            89.0              1\n",
       "7                       US            89.0          37730\n",
       "8                   France            88.9          21828\n",
       "9                    Italy            88.8          11042\n",
       "10               Australia            88.8           2037\n",
       "11              Luxembourg            88.7              6\n",
       "12                    None            88.6             63\n",
       "13                 Morocco            88.6             28\n",
       "14             Switzerland            88.6              7\n",
       "15                  Israel            88.5            500\n",
       "16             New Zealand            88.3           1311\n",
       "17            South Africa            88.2           1328\n",
       "18                Portugal            88.2           5686\n",
       "19                Slovenia            88.1             87\n",
       "20                  Turkey            88.1             90\n",
       "21                Bulgaria            87.9            141\n",
       "22                 Georgia            87.7             86\n",
       "23                 Lebanon            87.7             35\n",
       "24                 Armenia            87.5              2\n",
       "25                  Serbia            87.5             12\n",
       "26          Czech Republic            87.3             12\n",
       "27                  Greece            87.3            466\n",
       "28                   Spain            87.3           6581\n",
       "29                 Moldova            87.2             59\n",
       "30                 Croatia            87.2             73\n",
       "31                  Cyprus            87.2             11\n",
       "32                Slovakia            87.0              1\n",
       "33                 Uruguay            86.8            109\n",
       "34               Macedonia            86.8             12\n",
       "35               Argentina            86.7           3797\n",
       "36  Bosnia and Herzegovina            86.5              2\n",
       "37                   Chile            86.5           4361\n",
       "38                 Romania            86.4            120\n",
       "39                  Mexico            85.3             65\n",
       "40                  Brazil            84.7             52\n",
       "41                 Ukraine            84.1             14\n",
       "42                   Egypt            84.0              1\n",
       "43                    Peru            83.6             16"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT l.country,\n",
    "    ROUND(AVG(points),1) as average_points,\n",
    "    COUNT(*) as numberofwines\n",
    "FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "GROUP BY l.country\n",
    "ORDER BY average_points DESC;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So the country that produces the best wine is . . . England? Wait, that can't be right. Looking at the results, I see that some of the countries only have a small number of wines reviewed. China, for example, only has one wine review in the database, so we definitely should not put as much confidence in China's mean score as we can for countries with many more reviews like France and the U.S. For a more fair comparison, let's restrict the output to only those countries with at least 500 wines in the database. To filter rows on this condition, we use `HAVING` and not `WHERE` because the condition involves an aggregation function - specifically the count of the number of wines per country. We can rerun the previous query, including the `HAVING` clause:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>average_points</th>\n",
       "      <th>numberofwines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Austria</td>\n",
       "      <td>90.1</td>\n",
       "      <td>3337</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Germany</td>\n",
       "      <td>89.9</td>\n",
       "      <td>2134</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>US</td>\n",
       "      <td>89.0</td>\n",
       "      <td>37730</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>France</td>\n",
       "      <td>88.9</td>\n",
       "      <td>21828</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Italy</td>\n",
       "      <td>88.8</td>\n",
       "      <td>11042</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>Australia</td>\n",
       "      <td>88.8</td>\n",
       "      <td>2037</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>Israel</td>\n",
       "      <td>88.5</td>\n",
       "      <td>500</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>New Zealand</td>\n",
       "      <td>88.3</td>\n",
       "      <td>1311</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>88.2</td>\n",
       "      <td>5686</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>South Africa</td>\n",
       "      <td>88.2</td>\n",
       "      <td>1328</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>Spain</td>\n",
       "      <td>87.3</td>\n",
       "      <td>6581</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>86.7</td>\n",
       "      <td>3797</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>Chile</td>\n",
       "      <td>86.5</td>\n",
       "      <td>4361</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         country  average_points  numberofwines\n",
       "0        Austria            90.1           3337\n",
       "1        Germany            89.9           2134\n",
       "2             US            89.0          37730\n",
       "3         France            88.9          21828\n",
       "4          Italy            88.8          11042\n",
       "5      Australia            88.8           2037\n",
       "6         Israel            88.5            500\n",
       "7    New Zealand            88.3           1311\n",
       "8       Portugal            88.2           5686\n",
       "9   South Africa            88.2           1328\n",
       "10         Spain            87.3           6581\n",
       "11     Argentina            86.7           3797\n",
       "12         Chile            86.5           4361"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT l.country,\n",
    "    ROUND(AVG(points),1) as average_points,\n",
    "    COUNT(*) as numberofwines\n",
    "FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "GROUP BY l.country\n",
    "    HAVING COUNT(*) >= 500\n",
    "ORDER BY average_points DESC;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we were interested in ranking the countries according to their scores for a particular type of wine. To filter rows from the original data, we use a `WHERE` clause prior to `GROUP BY`. If we write `WHERE r.variety = 'Riesling'` prior to `GROUP BY`, the DBMS first extracts only the rows from the reviews table that refer to Riesling wines, then proceeds with the rest of the query. The following code ranks the countries based on their average scores for Rieslings, given at least 100 Rieslings from that country in the database:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>average_points</th>\n",
       "      <th>numberofwines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Austria</td>\n",
       "      <td>91.4</td>\n",
       "      <td>581</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>France</td>\n",
       "      <td>90.5</td>\n",
       "      <td>691</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Germany</td>\n",
       "      <td>90.1</td>\n",
       "      <td>1768</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Australia</td>\n",
       "      <td>89.4</td>\n",
       "      <td>111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>US</td>\n",
       "      <td>88.1</td>\n",
       "      <td>1600</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     country  average_points  numberofwines\n",
       "0    Austria            91.4            581\n",
       "1     France            90.5            691\n",
       "2    Germany            90.1           1768\n",
       "3  Australia            89.4            111\n",
       "4         US            88.1           1600"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT l.country,\n",
    "    ROUND(AVG(points),1) as average_points,\n",
    "    COUNT(*) as numberofwines\n",
    "FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "WHERE r.variety = 'Riesling'\n",
    "GROUP BY l.country\n",
    "    HAVING COUNT(*) >= 100\n",
    "ORDER BY average_points DESC;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Sometimes the groups in the data are formed by more than one column. In that case, simply add the second column name to the `GROUP BY` clause. For example, if we wanted to know the top rated combination of country and variety (with a minimum of 50 wines for that combination), we can use the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>country</th>\n",
       "      <th>variety</th>\n",
       "      <th>average_points</th>\n",
       "      <th>numberofwines</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>France</td>\n",
       "      <td>Tannat</td>\n",
       "      <td>91.5</td>\n",
       "      <td>59</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Austria</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>91.4</td>\n",
       "      <td>581</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>South Africa</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>90.6</td>\n",
       "      <td>84</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>France</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>90.5</td>\n",
       "      <td>691</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Austria</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>90.4</td>\n",
       "      <td>62</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>190</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>84.9</td>\n",
       "      <td>295</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>191</th>\n",
       "      <td>Spain</td>\n",
       "      <td>Rosé</td>\n",
       "      <td>84.9</td>\n",
       "      <td>149</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>192</th>\n",
       "      <td>Portugal</td>\n",
       "      <td>Rosé</td>\n",
       "      <td>84.6</td>\n",
       "      <td>235</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>193</th>\n",
       "      <td>Spain</td>\n",
       "      <td>Rosado</td>\n",
       "      <td>84.6</td>\n",
       "      <td>71</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>194</th>\n",
       "      <td>Argentina</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>84.3</td>\n",
       "      <td>78</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>195 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "          country                   variety  average_points  numberofwines\n",
       "0          France                    Tannat            91.5             59\n",
       "1         Austria                  Riesling            91.4            581\n",
       "2    South Africa  Bordeaux-style Red Blend            90.6             84\n",
       "3          France                  Riesling            90.5            691\n",
       "4         Austria                Chardonnay            90.4             62\n",
       "..            ...                       ...             ...            ...\n",
       "190     Argentina                Chardonnay            84.9            295\n",
       "191         Spain                      Rosé            84.9            149\n",
       "192      Portugal                      Rosé            84.6            235\n",
       "193         Spain                    Rosado            84.6             71\n",
       "194     Argentina           Sauvignon Blanc            84.3             78\n",
       "\n",
       "[195 rows x 4 columns]"
      ]
     },
     "execution_count": 53,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT l.country, r.variety,\n",
    "    ROUND(AVG(points),1) as average_points,\n",
    "    COUNT(*) as numberofwines\n",
    "FROM reviews r\n",
    "INNER JOIN locations l\n",
    "    ON r.location_id = l.location_id\n",
    "GROUP BY l.country, r.variety\n",
    "    HAVING COUNT(*) >= 50\n",
    "ORDER BY average_points DESC;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Subqueries\n",
    "There are situations in which it makes sense to use data aggregation techniques to generate new columns filled with group-level summary statistics, then to place these summary statistics into the original data table. To do this work, we can use **subqueries**. A subquery is a full SQL query that is used inside another SQL query. There are three types of subquery:\n",
    "\n",
    "1. Subqueries, like the ones for the mean and standard deviation above, that yield a single datapoint. These subqueries can be used anywhere we might write a value in the query, such as when defining new columns in `SELECT` or filtering rows with `WHERE` or `HAVING`.\n",
    "\n",
    "2. Subqueries that yield a list of values that can be used inside logical statements that include the `IN` operator.\n",
    "\n",
    "3. Subqueries that yield a data table that can be joined to existing data tables.\n",
    "\n",
    "Suppose for example that we wanted to generate a Z-score standardized version of the wine review points. A Z-score subtracts the mean of a column from every value in the column, then divides every value by the standard deviation of the column. When a Z-score equals 1, it means that the original value is one standard deviation above the mean of the column. To calculate the Z-score, we need to calculate the mean of the points column,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>avg</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>88.612107</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         avg\n",
       "0  88.612107"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT AVG(points) FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and the sample standard deviation of the points column,"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>stddev_samp</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2.955039</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   stddev_samp\n",
       "0     2.955039"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT STDDEV_SAMP(points) FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can type these values in manually with a query that looks like:\n",
    "```\n",
    "SELECT title, variety, points,\n",
    "(points - 88.612107))/(2.955039) as points_z \n",
    "FROM reviews;\n",
    "```\n",
    "The problem with typing these values in by hand is that it is easy to make a mistake and accidentially corrupt the `points_z` column because of a typo. Also these values will have to be changed by hand every time the data inside the database is updated. A better solution is to have SQL do the work of calculating the mean and standard deviation for us by using subqueries. All we need to do is replace the values with the queries (contained in parentheses) that generate those single values. In this case, the query is "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>points</th>\n",
       "      <th>points_z</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Casas del Bosque 2011 Reserva Sauvignon Blanc ...</td>\n",
       "      <td>Sauvignon Blanc</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Maurice Ecard 2009  Bourgogne</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>McGregor 2010 Semi-Dry Riesling (Finger Lakes)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Jules Taylor 2009 Ballochdale Estate Pinot Noi...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103722</th>\n",
       "      <td>Glenora 2010 Gewürztraminer (Finger Lakes)</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103723</th>\n",
       "      <td>Hard Row To Hoe 2010 Marsanne (Yakima Valley)</td>\n",
       "      <td>Marsanne</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103724</th>\n",
       "      <td>Animale 2009 Dolcetto (Columbia Valley (WA))</td>\n",
       "      <td>Dolcetto</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103725</th>\n",
       "      <td>Beresan 2008 The Buzz Yellow Jacket Vineyard R...</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>Cabot Vineyards 2007 Syrah (Humboldt County)</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>86</td>\n",
       "      <td>-0.88395</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103727 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                    title          variety  \\\n",
       "0       Casas del Bosque 2011 Reserva Sauvignon Blanc ...  Sauvignon Blanc   \n",
       "1       Marqués de Terán 2009 Selección Especial  (Rioja)      Tempranillo   \n",
       "2                           Maurice Ecard 2009  Bourgogne       Chardonnay   \n",
       "3          McGregor 2010 Semi-Dry Riesling (Finger Lakes)         Riesling   \n",
       "4       Jules Taylor 2009 Ballochdale Estate Pinot Noi...       Pinot Noir   \n",
       "...                                                   ...              ...   \n",
       "103722         Glenora 2010 Gewürztraminer (Finger Lakes)   Gewürztraminer   \n",
       "103723      Hard Row To Hoe 2010 Marsanne (Yakima Valley)         Marsanne   \n",
       "103724       Animale 2009 Dolcetto (Columbia Valley (WA))         Dolcetto   \n",
       "103725  Beresan 2008 The Buzz Yellow Jacket Vineyard R...        Red Blend   \n",
       "103726       Cabot Vineyards 2007 Syrah (Humboldt County)            Syrah   \n",
       "\n",
       "        points  points_z  \n",
       "0           86  -0.88395  \n",
       "1           86  -0.88395  \n",
       "2           86  -0.88395  \n",
       "3           86  -0.88395  \n",
       "4           86  -0.88395  \n",
       "...        ...       ...  \n",
       "103722      86  -0.88395  \n",
       "103723      86  -0.88395  \n",
       "103724      86  -0.88395  \n",
       "103725      86  -0.88395  \n",
       "103726      86  -0.88395  \n",
       "\n",
       "[103727 rows x 4 columns]"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT title, variety, points,\n",
    "(points - (SELECT AVG(points) FROM reviews))/(SELECT STDDEV(points) FROM reviews) as points_z \n",
    "FROM reviews;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that we wanted to restrict the rows to only the wines from wineries with at least 100 wines in the data. The problem is we don't know which wineries have at least 100 reviewed wines. But we can find out with a query:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>winery_id</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>9112</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4562</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>9557</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>13540</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>13381</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>1547</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>4257</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>13118</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>2007</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>7060</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>224</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>9995</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>8022</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>5076</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>4586</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>2375</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>14401</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>12042</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>9111</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>4124</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "    winery_id\n",
       "0        9112\n",
       "1        4562\n",
       "2        9557\n",
       "3       13540\n",
       "4       13381\n",
       "5        1547\n",
       "6        4257\n",
       "7       13118\n",
       "8        2007\n",
       "9        7060\n",
       "10        224\n",
       "11       9995\n",
       "12       8022\n",
       "13       5076\n",
       "14       4586\n",
       "15       2375\n",
       "16      14401\n",
       "17      12042\n",
       "18       9111\n",
       "19       4124"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT winery_id FROM reviews\n",
    "GROUP BY winery_id\n",
    "    HAVING COUNT(*) >= 100;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This query gives us a list of winery ID numbers that match the wineries with at least 100 reviewed wines in the data. We can now use this list inside another query that restricts the reviews data to only the wines from this list of wineries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>winery_id</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>14401</td>\n",
       "      <td>Wines &amp; Winemakers 2012 Pegos Claros Colheita ...</td>\n",
       "      <td>Castelão</td>\n",
       "      <td>87</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>14401</td>\n",
       "      <td>Wines &amp; Winemakers 2013 Lua Cheia em Vinhas Ve...</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>87</td>\n",
       "      <td>18.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>14401</td>\n",
       "      <td>Wines &amp; Winemakers 2013 Lua Cheia em Vinhas Ve...</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>87</td>\n",
       "      <td>12.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4124</td>\n",
       "      <td>Chateau Ste. Michelle 2008 Syrah (Columbia Val...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>87</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>4586</td>\n",
       "      <td>Concha y Toro 2010 Gravas del Maipo Syrah (Mai...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>91</td>\n",
       "      <td>200.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2742</th>\n",
       "      <td>13381</td>\n",
       "      <td>Trapiche 2016 Pure Malbec (Uco Valley)</td>\n",
       "      <td>Malbec</td>\n",
       "      <td>88</td>\n",
       "      <td>15.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2743</th>\n",
       "      <td>9111</td>\n",
       "      <td>Louis Jadot 2005 La Dominode Premier Cru  (Sav...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>90</td>\n",
       "      <td>37.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2744</th>\n",
       "      <td>9995</td>\n",
       "      <td>Montes 2009 Limited Selection Pinot Noir (Casa...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>89</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2745</th>\n",
       "      <td>4124</td>\n",
       "      <td>Chateau Ste. Michelle 2012 Canoe Ridge Estate ...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>89</td>\n",
       "      <td>22.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2746</th>\n",
       "      <td>14401</td>\n",
       "      <td>Wines &amp; Winemakers 2015 Nostalgia Alvarinho (V...</td>\n",
       "      <td>Alvarinho</td>\n",
       "      <td>88</td>\n",
       "      <td>23.0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2747 rows × 5 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      winery_id                                              title  \\\n",
       "0         14401  Wines & Winemakers 2012 Pegos Claros Colheita ...   \n",
       "1         14401  Wines & Winemakers 2013 Lua Cheia em Vinhas Ve...   \n",
       "2         14401  Wines & Winemakers 2013 Lua Cheia em Vinhas Ve...   \n",
       "3          4124  Chateau Ste. Michelle 2008 Syrah (Columbia Val...   \n",
       "4          4586  Concha y Toro 2010 Gravas del Maipo Syrah (Mai...   \n",
       "...         ...                                                ...   \n",
       "2742      13381             Trapiche 2016 Pure Malbec (Uco Valley)   \n",
       "2743       9111  Louis Jadot 2005 La Dominode Premier Cru  (Sav...   \n",
       "2744       9995  Montes 2009 Limited Selection Pinot Noir (Casa...   \n",
       "2745       4124  Chateau Ste. Michelle 2012 Canoe Ridge Estate ...   \n",
       "2746      14401  Wines & Winemakers 2015 Nostalgia Alvarinho (V...   \n",
       "\n",
       "             variety  points  price  \n",
       "0           Castelão      87   15.0  \n",
       "1     Portuguese Red      87   18.0  \n",
       "2     Portuguese Red      87   12.0  \n",
       "3              Syrah      87   13.0  \n",
       "4              Syrah      91  200.0  \n",
       "...              ...     ...    ...  \n",
       "2742          Malbec      88   15.0  \n",
       "2743      Pinot Noir      90   37.0  \n",
       "2744      Pinot Noir      89   20.0  \n",
       "2745      Chardonnay      89   22.0  \n",
       "2746       Alvarinho      88   23.0  \n",
       "\n",
       "[2747 rows x 5 columns]"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT winery_id, title, variety, points, price FROM reviews\n",
    "WHERE winery_id IN (\n",
    "    SELECT winery_id FROM reviews r\n",
    "    GROUP BY winery_id\n",
    "        HAVING COUNT(*) >= 100\n",
    "    );\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose we wanted a data table with the top rated wine from each winery. The problem is we don't know which wine is the top rated for each winery. Again, we can find out with a query that groups the reviews data by winery ID and uses the `MAX()` aggregation function to identify the maximum score achieved for any wine from that winery: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>winery_id</th>\n",
       "      <th>maxpoints</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>11233</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>4790</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3936</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>12502</td>\n",
       "      <td>87</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5468</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14566</th>\n",
       "      <td>4035</td>\n",
       "      <td>89</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14567</th>\n",
       "      <td>9180</td>\n",
       "      <td>92</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14568</th>\n",
       "      <td>4827</td>\n",
       "      <td>94</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14569</th>\n",
       "      <td>790</td>\n",
       "      <td>83</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14570</th>\n",
       "      <td>10896</td>\n",
       "      <td>91</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>14571 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       winery_id  maxpoints\n",
       "0          11233         91\n",
       "1           4790         95\n",
       "2           3936         87\n",
       "3          12502         87\n",
       "4           5468         92\n",
       "...          ...        ...\n",
       "14566       4035         89\n",
       "14567       9180         92\n",
       "14568       4827         94\n",
       "14569        790         83\n",
       "14570      10896         91\n",
       "\n",
       "[14571 rows x 2 columns]"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT winery_id, MAX(points) as maxpoints\n",
    "FROM reviews\n",
    "GROUP BY winery_id;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that this last table already existed in the database with the name \"bestscores\". We would be able to join bestscores and reviews, then filter the rows to only those wines whose scores are equal to the maximum scores achieved by the winery with the following code:\n",
    "```\n",
    "SELECT r.title, r.variety, r.points, r.price FROM reviews r\n",
    "INNER JOIN bestscores b\n",
    "    ON r.winery_id = b.winery_id\n",
    "WHERE r.points = b.maxpoints;\n",
    "```\n",
    "But because we do not have a table named \"bestscores\", we can replace the reference to this table with the subquery that generates this table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Marqués de Terán 2009 Selección Especial  (Rioja)</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>86</td>\n",
       "      <td>24.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Alessandro Veglio 2011 Gattera  (Barolo)</td>\n",
       "      <td>Nebbiolo</td>\n",
       "      <td>87</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Pingao 2013  Rioja</td>\n",
       "      <td>Tempranillo</td>\n",
       "      <td>87</td>\n",
       "      <td>13.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Chateau Walla Walla 2008 Syrah (Walla Walla Va...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>87</td>\n",
       "      <td>40.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Sweet Valley 2008 Cabernet Sauvignon (Walla Wa...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>87</td>\n",
       "      <td>35.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20643</th>\n",
       "      <td>Kicker Cane 2014 Cabernet Sauvignon (Alexander...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>88</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20644</th>\n",
       "      <td>Tenuta Grimani 2015 Farinaldo  (Soave)</td>\n",
       "      <td>Garganega</td>\n",
       "      <td>88</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20645</th>\n",
       "      <td>Vin Vault NV Cabernet Sauvignon (California)</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>88</td>\n",
       "      <td>20.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20646</th>\n",
       "      <td>Dachshund NV Bubbles Sparkling (Germany)</td>\n",
       "      <td>Sparkling Blend</td>\n",
       "      <td>88</td>\n",
       "      <td>17.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20647</th>\n",
       "      <td>Domaine Guillot-Broux 2009 Beaumont  (Mâcon-Cr...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>88</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>20648 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                   title             variety  \\\n",
       "0      Marqués de Terán 2009 Selección Especial  (Rioja)         Tempranillo   \n",
       "1               Alessandro Veglio 2011 Gattera  (Barolo)            Nebbiolo   \n",
       "2                                     Pingao 2013  Rioja         Tempranillo   \n",
       "3      Chateau Walla Walla 2008 Syrah (Walla Walla Va...               Syrah   \n",
       "4      Sweet Valley 2008 Cabernet Sauvignon (Walla Wa...  Cabernet Sauvignon   \n",
       "...                                                  ...                 ...   \n",
       "20643  Kicker Cane 2014 Cabernet Sauvignon (Alexander...  Cabernet Sauvignon   \n",
       "20644             Tenuta Grimani 2015 Farinaldo  (Soave)           Garganega   \n",
       "20645       Vin Vault NV Cabernet Sauvignon (California)  Cabernet Sauvignon   \n",
       "20646           Dachshund NV Bubbles Sparkling (Germany)     Sparkling Blend   \n",
       "20647  Domaine Guillot-Broux 2009 Beaumont  (Mâcon-Cr...          Pinot Noir   \n",
       "\n",
       "       points  price  \n",
       "0          86   24.0  \n",
       "1          87    NaN  \n",
       "2          87   13.0  \n",
       "3          87   40.0  \n",
       "4          87   35.0  \n",
       "...       ...    ...  \n",
       "20643      88   20.0  \n",
       "20644      88    NaN  \n",
       "20645      88   20.0  \n",
       "20646      88   17.0  \n",
       "20647      88    NaN  \n",
       "\n",
       "[20648 rows x 4 columns]"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = \"\"\"\n",
    "SELECT r.title, r.variety, r.points, r.price FROM reviews r\n",
    "INNER JOIN (\n",
    "    SELECT winery_id, MAX(points) as maxpoints\n",
    "    FROM reviews\n",
    "    GROUP BY winery_id) b\n",
    "    ON r.winery_id = b.winery_id\n",
    "WHERE r.points = b.maxpoints;\n",
    "\"\"\"\n",
    "pd.read_sql_query(myquery, con=engine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, once we are finished working with the PostgreSQL server, we close it:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [],
   "source": [
    "dbserver.close()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## SQL Security\n",
    "One criticism of SQL is that it can be manipulated to give unauthorized and hostile users access to perform CRUD operations on the data. This kind of [attack](https://en.wikipedia.org/wiki/SQL_injection) is called an **SQL injection attack**, and the best illustration of this attack comes from an [XKCD](https://xkcd.com) web comic:\n",
    "\n",
    "<a href=\"https://xkcd.com/327/\"><img src=\"https://imgs.xkcd.com/comics/exploits_of_a_mom.png\" width=\"500\"></a>\n",
    "\n",
    "The following discussion borrows heavily from [this blog's](https://explainxkcd.com/wiki/index.php/327:_Exploits_of_a_Mom) explaination of this XKCD comic. \n",
    "\n",
    "An SQL injection attack starts with an entry field on the user interface of an application that works with a database, like a field where users write their name. The application reads the name and creates an SQL INSERT operation to create a new record in the data with the name. If I were to enter Jonathan into the name field, the app should generate an SQL command that looks like this:\n",
    "```\n",
    "INSERT INTO Students (firstname) VALUES ('Jonathan');\n",
    "```\n",
    "This command specifically places the value \"Jonathan\" into the `firstname` attribute of the `Students` entity. In SQL, different commands are separated by semicolons, so if I wanted to issue two SQL commands I could type:\n",
    "```\n",
    "INSERT INTO Students (firstname) VALUES ('Jonathan'); INSERT INTO Students (lastname) VALUES ('Kropko');\n",
    "```\n",
    "An SQL injection attack works by writing SQL code in a field that is designed to collect data to input into a database. So If I type my name as `Jonathan'); DROP TABLE Students; --;`, then the SQL create operation becomes\n",
    "```\n",
    "INSERT INTO Students (firstname) VALUES ('Jonathan'); DROP TABLE Students; --;');\n",
    "```\n",
    "This line consists of three commands\n",
    "* `INSERT INTO Students (firstname) VALUES ('Jonathan');` which inputs \"Jonathan\" into the database,\n",
    "* `DROP TABLE Students;` which deletes the entire `Students` table, and\n",
    "* `--;');`: the `--` symbol is an SQL comment, and tells the parser to ignore the remainder of the code, which would avoid a parsing error.\n",
    "\n",
    "So just by inserting specific code into a seemingly innocuous field, like name, I can delete the entire `Students` entity in the database.\n",
    "\n",
    "There are two ways to combat SQL injection attacks. First, it is possible to \"sanitize\" database inputs by using code that automatically places a slash before a single quote. That puts an [escape character](https://en.wikipedia.org/wiki/Escape_character) in front of the quote, which makes it part of the input string and prevents it from being read as the end of the input string. Another approach is to use [prepared statements](https://en.wikipedia.org/wiki/Prepared_statement) when converting user-entered data into an SQL query. A prepared statement uses placeholders to stand in for the user-supplied data, and treats the data like input into a function: treating the user data this way prevents the entire SQL query from being read as a single string, and prevents SQL injection. For example, instead of inputing the name directly into the query, the database manager can construct the query in Python code (where a database cursor exists and is named `curs`) like this:\n",
    "```\n",
    "cmd = \"INSERT INTO Students (firstname) VALUES (%s)\"\n",
    "curs.execute(cmd, (name,))\n",
    "```\n",
    "In MySQL and PostgreSQL, `%s` stands in for a parameter to be passed into the query (in SQlite, the stand-in symbol is `?` instead of `%s`). Constructing a query in this way prevents SQL injection attacks. More information about formatting secure SQL code is available at https://bobby-tables.com/, named in honor of this XKCD comic.\n",
    "\n",
    "As a data scientist mostly issuing read operations, it is unethical for you to attack a database in this way. If you are testing whether a database is secured against SQL injection attacks, don't try to issue any `DROP` commands as other commands like `SELECT` will reveal the insecurity but won't make changes in the database. If you are building a database that is connected to an interface for users to enter data, please be aware of the SQL injection vulnerability and use prepared statements to guard against it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## MongoDB Queries\n",
    "SQL is a universal language for issuing queries to relational databases, whether the database is managed by SQLite, MySQL, PostgreSQL, or another RDBMS. For NoSQL databases, however, there is no universal query language. Every DBMS has its own query language, and will provide a guide for learning that language. Some of these guides include the ones for key-value stores in [Redis](https://redis.io/commands) and wide column stores in [Cassandra](https://cassandra.apache.org/doc/latest/cql/index.html). Neo4j has developed a programming language called [Cypher](https://neo4j.com/developer/cypher-query-language/) that is explicitly for issuing queries to graph databases. Of all of these query protocols, the language used by [MongoDB](https://docs.mongodb.com/guides/server/read_queries/) for issuing queries to document stores is one of the most universal because it works entirely with JSONs: queries are written in JSON format and the output is organized in JSON format. All of these query languages include methods for all of the CRUD operations.\n",
    "\n",
    "The most important difference between relational and NoSQL databases is the rigidity of the schema that organizes the data. The advantage of the strict organization of a relational database, as illustrated in an ER diagram, is that the data that can be extracted from the database using an SQL query will be clean and mostly immediately ready to be analyzed. The disadvantage is that relational databases have schema that are hard to change once they've been created and populated with data. SQL also, despite the best intentions of the originators of SQL, can be very difficult for people use for some tasks. For extremely large datasets with many tables, it can be extremely difficult to keep track of what data exists in which table. In contrast, NoSQL databases generally have flexible schema that can be changed easily and can vary even from record to record. There are no rules, like the normalization rules, that require that the data be split into different tables, so there is no need for visual maps like ER diagrams. Also, because all of the data for one record exists in the same JSON dictionary, it is easy to use remote, distributed storage to store all of the records. The disadvantage of NoSQL databases is that the data are rarely ready for analysis after a query. It's a buy-now-pay-later situation: the price we pay for the convenience of NoSQL storage and organization is that the output requires more work to use.\n",
    "\n",
    "Some concepts that are crucial to SQL are not relevant to NoSQL. There are **no joins** in a document store because all the data for a record exist in the same JSON code. As such, we don't have to worry about accomplishing these tasks within a NoSQL query. NoSQL queries in general focus narrowly on the CRUD operations, although MongoDB provides some advanced functionality for searching for patterns within text and ranking documents based on their relevance to given search terms.\n",
    "\n",
    "For the following examples, I will use the document store database that we created in module 6, containing the same data on wine reviews that we practiced with above, only in JSON format. First I load the `pymongo` package and the `dumps()` and `loads()` functions from the `json_util` module of the `bson` package:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pymongo\n",
    "from bson.json_util import dumps, loads"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The wine reviews database is stored as a collection `winecollection` within the `winedb` database on my local machine. I load it with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [],
   "source": [
    "myclient = pymongo.MongoClient(\"mongodb://localhost/\")\n",
    "winedb = myclient[\"winedb\"]\n",
    "winecollection = winedb[\"winecollection\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will discuss more advanced read techniques below, but to see one record, we can issue a query using JSON code and we can see the output in JSON format. To see all of the data for the \"Nicosia 2013 Vulkà Bianco\", we search for the record based on the title of this wine with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'_id': ObjectId('5ed80dbca25fcf746119e3aa'), 'wine_id': 0, 'country': 'Italy', 'description': \"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.\", 'points': 87, 'price': None, 'province': 'Sicily & Sardinia', 'region': 'Etna', 'taster_name': 'Kerin O’Keefe', 'taster_twitter_handle': '@kerinokeefe', 'title': 'Nicosia 2013 Vulkà Bianco  (Etna)', 'variety': 'White Blend', 'winery': 'Nicosia'}\n"
     ]
    }
   ],
   "source": [
    "myquery = { 'title': 'Nicosia 2013 Vulkà Bianco  (Etna)'}\n",
    "mywine = winecollection.find(myquery) \n",
    "for x in mywine:\n",
    "    print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that all of the data for this wine exists in this JSON dictionary, including data from the reviews, locations, wineries, and tasters tables in the PostgreSQL database. When we created this MongoDB database, the DBMS automatically created a unique ID for each record designated with the key `_id`. \n",
    "\n",
    "We can now use the methods in `pymongo` for creating, reading, updating, and deleting records and we will apply these methods to the `winecollection` variable that accesses the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Creating and Deleting Records\n",
    "Did you know that former NBA all-star Dwyane Wade has a winery? It's called [D Wade Cellars](https://dwadecellars.com/) and it is based in the Napa Valley in California. Let's add the [2016 Napa Valley Three By Wade Red Blend](https://dwadecellars.vinespring.com/purchase/detail?item=2016-napa-valley-three-by-wade-red-blend) into the database. The first step is to express all of the data we want to associate with a new record in JSON format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "metadata": {},
   "outputs": [],
   "source": [
    "dwadewine = {'title': '2016 Napa Valley Three By Wade Red Blend', \n",
    "'description': \"This wine goes great with dinner just like Dwyane Wade goes great with LeBron James or Shaq.\", \n",
    "'taster_name': 'Jonathan Kropko', \n",
    "'taster_twitter_handle': '@jmk5131', \n",
    "'price': '35', \n",
    "'variety': 'Red Blend', \n",
    "'location':{\n",
    "    'region_1': 'Napa Valley', \n",
    "    'region_2': None, \n",
    "    'province': 'California', \n",
    "    'country': 'U.S.', \n",
    "    'winery': 'D Wade Cellars'}}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In creating this JSON record, I tried to follow the standards that exist elsewhere in the data by using the same feature names. I departed from the format of other records in two ways. First, I omitted the points and designation features. Second, I placed all the information about the location and name of the winery under the \"location\" key, which induces some nesting structure.\n",
    "\n",
    "To add this one record to the database, I use the `.insert_one()` method on the `winecollection` database, with the following code:\n",
    "```\n",
    "winecollection.insert_one(dwadewine)\n",
    "```\n",
    "By default, the `insert_one()` method automatically checks to see whether the record already exists in the data, and throws an error if it does, unless we specify the `bypass_document_validation=True` argument, which allows duplicate records to be input into the database. For the purposes of this notebook, I rerun these cells many times while writing, and I don't want to place many duplicate records into the database. Instead, I can delete the record if it already exists. The code\n",
    "```\n",
    "winecollection.count_documents({'title': '2016 Napa Valley Three By Wade Red Blend'})\n",
    "```\n",
    "generates a count of the records of wines in the database that have this title. If there are any existing records, I can delete all of these records with the `.delete_many()` method, in which the argument is a JSON with enough fields specified to exactly match the records we want to delete:\n",
    "```\n",
    "winecollection.delete_many({'title': '2016 Napa Valley Three By Wade Red Blend'})\n",
    "```\n",
    "In constrast, the `.delete_one()` method will only delete the first record, when sorting by `_id`, that matches the query. If there are no documents that match the query, he `.delete_all()` or `.delete_one()` methods will both still process the query without error, but will not change anything in the database.\n",
    "\n",
    "We first delete any records of wines with the title \"2016 Napa Valley Three By Wade Red Blend\" with `.delete_all()`, then we insert the entire record of this wine with `.insert_one()`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pymongo.results.InsertOneResult at 0x12b264a48>"
      ]
     },
     "execution_count": 66,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.delete_many({'title': '2016 Napa Valley Three By Wade Red Blend'})\n",
    "winecollection.insert_one(dwadewine)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that this record exists in the database, we can find this record by any of the fields associated with the record, such as the title of the wine for example:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'_id': ObjectId('5edd56e5b4e58ce3841e5dea'), 'title': '2016 Napa Valley Three By Wade Red Blend', 'description': 'This wine goes great with dinner just like Dwyane Wade goes great with LeBron James or Shaq.', 'taster_name': 'Jonathan Kropko', 'taster_twitter_handle': '@jmk5131', 'price': '35', 'variety': 'Red Blend', 'location': {'region_1': 'Napa Valley', 'region_2': None, 'province': 'California', 'country': 'U.S.', 'winery': 'D Wade Cellars'}}\n"
     ]
    }
   ],
   "source": [
    "myquery = {'title': '2016 Napa Valley Three By Wade Red Blend'}\n",
    "mywine = winecollection.find(myquery) \n",
    "for x in mywine:\n",
    "    print(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that MongoDB automatically generates a unique ID value for this document and includes it under the `_id` field in the JSON output.\n",
    "\n",
    "Wikipedia lists [93 other celebrities](https://en.wikipedia.org/wiki/List_of_celebrities_who_own_wineries_and_vineyards) other than Dwyane Wade who own wineries and vineyards, including [Antonio Banderas](https://www.decanter.com/wine-news/antonio-banderas-32255/), [Drew Barrymore](https://thewinesiren.com/drew-barrymore-vintner/), and [Lil Jon](http://www.today.com/id/23945035/ns/today-today_entertainment/t/rapper-lil-jon-starts-his-own-wine-label/#.XtV1BZp7l24). If we want to add more than one record to the wine collection database, we need to create a list of individual JSON dictionaries with code that looks like\n",
    "```\n",
    "newrecords = [{JSON dictionary 1}, {JSON dictionary 2}, {JSON dictionary 3}]\n",
    "```\n",
    "In this case, I can create entries for Antonio Banderas, Drew Barrymore, and Lil Jon's wines and store them in one list:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "newwines = [{'title': 'Anta Banderas A 10 2008', \n",
    "             'description': \"This wine will make you speak differently. Maybe not with a charming Spanish accent, but you might think you sound that way.\", \n",
    "             'taster_name': 'Jonathan Kropko', \n",
    "             'taster_twitter_handle': '@jmk5131', \n",
    "             'price': '40.99', \n",
    "             'variety': 'Red Blend', \n",
    "             'location':{\n",
    "                 'region_1': 'Ribera del Duoro', \n",
    "                 'region_2': None, \n",
    "                 'province': 'Valladolid', \n",
    "                 'country': 'Spain', \n",
    "                 'winery': 'Anta Banderas'}},\n",
    "           {'title': 'Barrymore Rose 2013', \n",
    "             'description': \"Someone drank my entire bottle of wine!\", \n",
    "             'taster_name': 'Jonathan Kropko', \n",
    "             'taster_twitter_handle': '@jmk5131', \n",
    "             'price': '14.99', \n",
    "             'variety': 'Rose', \n",
    "             'location':{\n",
    "                 'region_1': 'Monterey', \n",
    "                 'region_2': None, \n",
    "                 'province': 'California', \n",
    "                 'country': 'U.S.', \n",
    "                 'winery': 'Barrymore Vineyard'}},\n",
    "           {'title': '2006 Little Jonathan Winery Cabernet Sauvignon', \n",
    "             'description': \"This upscale crunk juice is OOOKAAAAAAY.\", \n",
    "             'taster_name': 'Jonathan Kropko', \n",
    "             'taster_twitter_handle': '@jmk5131',  \n",
    "             'variety': 'Cabernet Sauvignon', \n",
    "             'location':{\n",
    "                 'region_1': 'Central Coast', \n",
    "                 'region_2': 'Paso Robles', \n",
    "                 'province': 'California', \n",
    "                 'country': 'U.S.', \n",
    "                 'winery': 'Little Jonathan Winery'}}]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To add these three records to the database with one line of code, we use the `.insert_many()` method. To avoid duplicates, we first delete any records of wines titled \"Anta Banderas A 10 2008\", \"Barrymore Rose 2013\", or \"2006 Little Jonathan Winery Cabernet Sauvignon\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pymongo.results.InsertManyResult at 0x1268132c8>"
      ]
     },
     "execution_count": 69,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.delete_many({'title': 'Anta Banderas A 10 2008'})\n",
    "winecollection.delete_many({'title': 'Barrymore Rose 2013'})\n",
    "winecollection.delete_many({'title': '2006 Little Jonathan Winery Cabernet Sauvignon'})\n",
    "winecollection.insert_many(newwines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Reading Data and Selecting Records\n",
    "To read all of the records in a MongoDB collection, use the `.find()` method and pass an empty JSON dictionary to this method. For the wine reviews collection, we can query the entire collection by typing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "myquery = winecollection.find({})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the queried data exist within the variable `cursor`. The data are not displayed automatically. To see the data in JSON format, we can employ the `print()` function on elements of the cursor. To see the first element:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'_id': ObjectId('5ed80dbca25fcf746119e3aa'), 'wine_id': 0, 'country': 'Italy', 'description': \"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.\", 'points': 87, 'price': None, 'province': 'Sicily & Sardinia', 'region': 'Etna', 'taster_name': 'Kerin O’Keefe', 'taster_twitter_handle': '@kerinokeefe', 'title': 'Nicosia 2013 Vulkà Bianco  (Etna)', 'variety': 'White Blend', 'winery': 'Nicosia'}\n"
     ]
    }
   ],
   "source": [
    "print(myquery[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And to see more elements, we can use a loop. Here's code to view the first three wines:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'_id': ObjectId('5ed80dbca25fcf746119e3aa'), 'wine_id': 0, 'country': 'Italy', 'description': \"Aromas include tropical fruit, broom, brimstone and dried herb. The palate isn't overly expressive, offering unripened apple, citrus and dried sage alongside brisk acidity.\", 'points': 87, 'price': None, 'province': 'Sicily & Sardinia', 'region': 'Etna', 'taster_name': 'Kerin O’Keefe', 'taster_twitter_handle': '@kerinokeefe', 'title': 'Nicosia 2013 Vulkà Bianco  (Etna)', 'variety': 'White Blend', 'winery': 'Nicosia'}\n",
      "{'_id': ObjectId('5ed80dcca25fcf746119e3ab'), 'wine_id': 1, 'country': 'Portugal', 'description': \"This is ripe and fruity, a wine that is smooth while still structured. Firm tannins are filled out with juicy red berry fruits and freshened with acidity. It's  already drinkable, although it will certainly be better from 2016.\", 'points': 87, 'price': 15.0, 'province': 'Douro', 'region': None, 'taster_name': 'Roger Voss', 'taster_twitter_handle': '@vossroger', 'title': 'Quinta dos Avidagos 2011 Avidagos Red (Douro)', 'variety': 'Portuguese Red', 'winery': 'Quinta dos Avidagos'}\n",
      "{'_id': ObjectId('5ed80dcca25fcf746119e3ac'), 'wine_id': 2, 'country': 'US', 'description': 'Tart and snappy, the flavors of lime flesh and rind dominate. Some green pineapple pokes through, with crisp acidity underscoring the flavors. The wine was all stainless-steel fermented.', 'points': 87, 'price': 14.0, 'province': 'Oregon', 'region': 'Willamette Valley', 'taster_name': 'Paul Gregutt', 'taster_twitter_handle': '@paulgwine\\xa0', 'title': 'Rainstorm 2013 Pinot Gris (Willamette Valley)', 'variety': 'Pinot Gris', 'winery': 'Rainstorm'}\n"
     ]
    }
   ],
   "source": [
    "for i in myquery[0:3]:\n",
    "    print(i)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Displaying the query output data as a list of JSON dictionaries, however, is not the most useful way to store the data. We need a way to put these data into a dataframe. For that we can use the `dumps()` and `loads()` functions from the `bson` library. These functions work exactly like the `dumps()` and `loads()` functions from the `json` library, but they remove some of the extra components of these JSON dictionaries associated with the database. To query all of the data and to place all of it into a dataframe, we pass the query output to `dumps()`, which converts the query output to plain text. Next we pass this text to `loads()`, which registers the text as a list of JSON dictionaries. Finally we use this list as the argument of `pd.DataFrame.from_records()` to convert the output to a dataframe. For the wine collection, this code is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "      <th>location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dbca25fcf746119e3aa</td>\n",
       "      <td>0.0</td>\n",
       "      <td>Italy</td>\n",
       "      <td>Aromas include tropical fruit, broom, brimston...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>None</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Etna</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Nicosia 2013 Vulkà Bianco  (Etna)</td>\n",
       "      <td>White Blend</td>\n",
       "      <td>Nicosia</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcca25fcf746119e3ab</td>\n",
       "      <td>1.0</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>This is ripe and fruity, a wine that is smooth...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>15</td>\n",
       "      <td>Douro</td>\n",
       "      <td>None</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Quinta dos Avidagos 2011 Avidagos Red (Douro)</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>Quinta dos Avidagos</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcca25fcf746119e3ac</td>\n",
       "      <td>2.0</td>\n",
       "      <td>US</td>\n",
       "      <td>Tart and snappy, the flavors of lime flesh and...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>14</td>\n",
       "      <td>Oregon</td>\n",
       "      <td>Willamette Valley</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Rainstorm 2013 Pinot Gris (Willamette Valley)</td>\n",
       "      <td>Pinot Gris</td>\n",
       "      <td>Rainstorm</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcca25fcf746119e3ad</td>\n",
       "      <td>3.0</td>\n",
       "      <td>US</td>\n",
       "      <td>Pineapple rind, lemon pith and orange blossom ...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>13</td>\n",
       "      <td>Michigan</td>\n",
       "      <td>Lake Michigan Shore</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>St. Julian 2013 Reserve Late Harvest Riesling ...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>St. Julian</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcca25fcf746119e3ae</td>\n",
       "      <td>4.0</td>\n",
       "      <td>US</td>\n",
       "      <td>Much like the regular bottling from 2012, this...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>65</td>\n",
       "      <td>Oregon</td>\n",
       "      <td>Willamette Valley</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Sweet Cheeks 2012 Vintner's Reserve Wild Child...</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>Sweet Cheeks</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103726</th>\n",
       "      <td>5ed80dcfa25fcf74611b78d8</td>\n",
       "      <td>129970.0</td>\n",
       "      <td>France</td>\n",
       "      <td>Big, rich and off-dry, this is powered by inte...</td>\n",
       "      <td>90.0</td>\n",
       "      <td>21</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>Alsace</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...</td>\n",
       "      <td>Gewürztraminer</td>\n",
       "      <td>Domaine Schoffit</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103727</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This wine goes great with dinner just like Dwy...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103728</th>\n",
       "      <td>5edd56e6b4e58ce3841e5deb</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This wine will make you speak differently. May...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.99</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>Anta Banderas A 10 2008</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Ribera del Duoro', 'region_2': N...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103729</th>\n",
       "      <td>5edd56e6b4e58ce3841e5dec</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Someone drank my entire bottle of wine!</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.99</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>Barrymore Rose 2013</td>\n",
       "      <td>Rose</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Monterey', 'region_2': None, 'pr...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>103730</th>\n",
       "      <td>5edd56e6b4e58ce3841e5ded</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This upscale crunk juice is OOOKAAAAAAY.</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>2006 Little Jonathan Winery Cabernet Sauvignon</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Central Coast', 'region_2': 'Pas...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>103731 rows × 14 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                             _id   wine_id   country  \\\n",
       "0       5ed80dbca25fcf746119e3aa       0.0     Italy   \n",
       "1       5ed80dcca25fcf746119e3ab       1.0  Portugal   \n",
       "2       5ed80dcca25fcf746119e3ac       2.0        US   \n",
       "3       5ed80dcca25fcf746119e3ad       3.0        US   \n",
       "4       5ed80dcca25fcf746119e3ae       4.0        US   \n",
       "...                          ...       ...       ...   \n",
       "103726  5ed80dcfa25fcf74611b78d8  129970.0    France   \n",
       "103727  5edd56e5b4e58ce3841e5dea       NaN       NaN   \n",
       "103728  5edd56e6b4e58ce3841e5deb       NaN       NaN   \n",
       "103729  5edd56e6b4e58ce3841e5dec       NaN       NaN   \n",
       "103730  5edd56e6b4e58ce3841e5ded       NaN       NaN   \n",
       "\n",
       "                                              description  points  price  \\\n",
       "0       Aromas include tropical fruit, broom, brimston...    87.0   None   \n",
       "1       This is ripe and fruity, a wine that is smooth...    87.0     15   \n",
       "2       Tart and snappy, the flavors of lime flesh and...    87.0     14   \n",
       "3       Pineapple rind, lemon pith and orange blossom ...    87.0     13   \n",
       "4       Much like the regular bottling from 2012, this...    87.0     65   \n",
       "...                                                   ...     ...    ...   \n",
       "103726  Big, rich and off-dry, this is powered by inte...    90.0     21   \n",
       "103727  This wine goes great with dinner just like Dwy...     NaN     35   \n",
       "103728  This wine will make you speak differently. May...     NaN  40.99   \n",
       "103729            Someone drank my entire bottle of wine!     NaN  14.99   \n",
       "103730           This upscale crunk juice is OOOKAAAAAAY.     NaN    NaN   \n",
       "\n",
       "                 province               region         taster_name  \\\n",
       "0       Sicily & Sardinia                 Etna       Kerin O’Keefe   \n",
       "1                   Douro                 None          Roger Voss   \n",
       "2                  Oregon    Willamette Valley        Paul Gregutt   \n",
       "3                Michigan  Lake Michigan Shore  Alexander Peartree   \n",
       "4                  Oregon    Willamette Valley        Paul Gregutt   \n",
       "...                   ...                  ...                 ...   \n",
       "103726             Alsace               Alsace          Roger Voss   \n",
       "103727                NaN                  NaN     Jonathan Kropko   \n",
       "103728                NaN                  NaN     Jonathan Kropko   \n",
       "103729                NaN                  NaN     Jonathan Kropko   \n",
       "103730                NaN                  NaN     Jonathan Kropko   \n",
       "\n",
       "       taster_twitter_handle  \\\n",
       "0               @kerinokeefe   \n",
       "1                 @vossroger   \n",
       "2                @paulgwine    \n",
       "3                       None   \n",
       "4                @paulgwine    \n",
       "...                      ...   \n",
       "103726            @vossroger   \n",
       "103727              @jmk5131   \n",
       "103728              @jmk5131   \n",
       "103729              @jmk5131   \n",
       "103730              @jmk5131   \n",
       "\n",
       "                                                    title             variety  \\\n",
       "0                       Nicosia 2013 Vulkà Bianco  (Etna)         White Blend   \n",
       "1           Quinta dos Avidagos 2011 Avidagos Red (Douro)      Portuguese Red   \n",
       "2           Rainstorm 2013 Pinot Gris (Willamette Valley)          Pinot Gris   \n",
       "3       St. Julian 2013 Reserve Late Harvest Riesling ...            Riesling   \n",
       "4       Sweet Cheeks 2012 Vintner's Reserve Wild Child...          Pinot Noir   \n",
       "...                                                   ...                 ...   \n",
       "103726  Domaine Schoffit 2012 Lieu-dit Harth Cuvée Car...      Gewürztraminer   \n",
       "103727           2016 Napa Valley Three By Wade Red Blend           Red Blend   \n",
       "103728                            Anta Banderas A 10 2008           Red Blend   \n",
       "103729                                Barrymore Rose 2013                Rose   \n",
       "103730     2006 Little Jonathan Winery Cabernet Sauvignon  Cabernet Sauvignon   \n",
       "\n",
       "                     winery                                           location  \n",
       "0                   Nicosia                                                NaN  \n",
       "1       Quinta dos Avidagos                                                NaN  \n",
       "2                 Rainstorm                                                NaN  \n",
       "3                St. Julian                                                NaN  \n",
       "4              Sweet Cheeks                                                NaN  \n",
       "...                     ...                                                ...  \n",
       "103726     Domaine Schoffit                                                NaN  \n",
       "103727                  NaN  {'region_1': 'Napa Valley', 'region_2': None, ...  \n",
       "103728                  NaN  {'region_1': 'Ribera del Duoro', 'region_2': N...  \n",
       "103729                  NaN  {'region_1': 'Monterey', 'region_2': None, 'pr...  \n",
       "103730                  NaN  {'region_1': 'Central Coast', 'region_2': 'Pas...  \n",
       "\n",
       "[103731 rows x 14 columns]"
      ]
     },
     "execution_count": 73,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = winecollection.find({})\n",
    "wine_text = dumps(myquery)\n",
    "wine_records = loads(wine_text)\n",
    "wine_df = pd.DataFrame.from_records(wine_records)\n",
    "wine_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like SQL, read operations in MongoDB can filter records based on logical conditions. Unlike SQL, MongoDB uses different symbols for the common logical operators, and these symbols need to be listed within the JSON formatted query.  A couple of the operators are implicit in the JSON syntax. To search on an equality condition, the general syntax is\n",
    "```\n",
    "{'key' : value}\n",
    "```\n",
    "For example, to find all of the wines in the database that are from virginia, we can use the following query:\n",
    "```\n",
    "{'province' : 'Virginia'}\n",
    "```\n",
    "The other implicit operator is \"and\", which is expressed simply by including more than one key-value pair within the syntax. To specify that a feature `key1` is equal to `value1` AND that `key2` is equal to `value2`, type:\n",
    "```\n",
    "{'key1' : value1,\n",
    " 'key2' : value2}\n",
    "```\n",
    "For example, to filter the data to Pinot Noir wines from Virginia, we can type\n",
    "```\n",
    "{'variety' : 'Pinot Noir',\n",
    " 'province' : 'Virginia'}\n",
    "```\n",
    "For all other logical operators, MongoDB uses special syntax, described below. To use these operators in a query, the general template is general syntax for using an operator within a MongoDB query is\n",
    "```\n",
    "{'key' : {'$operator' : value } }\n",
    "```\n",
    "The operators are listed in the following table:\n",
    "\n",
    "| Operator                                                              | Syntax                                              | Example query                                                                                                      | Example code                                                                      |\n",
    "|-----------------------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------|\n",
    "| Equal to                                                              | implicit                                            | All wines with scores of 100                                                                                       | `{'points': 100}`                                                                 |\n",
    "| Greater than                                                          | `'$gt'`                                               | All wines that are more expensive than \\\\$30                                                                         | `{'price': {'$gt': 30}}`                                                            |\n",
    "| Greater than or equal to                                              | `'$gte'`                                              | All wines with scores of 95 or higher                                                                              | `{'points': {'$gte': 95}}`                                                          |\n",
    "| Less than                                                             | `'$lt'`                                               | All wines that are cheaper than \\\\$20                                                                                | `{'price': {'$lt': 20}}`                                                            |\n",
    "| Less than or equal to                                                 | `'$lte'`                                              | All wines with scores of 85 or lower                                                                               | `{'points': {'$lte': 85}}`                                                          |\n",
    "| Not equal                                                             | `'$ne'`                                               | Wines that are not red blends                                                                                      | `{'variety': {'$ne': 'Red Blend'}`                                                  |\n",
    "| And                                                                   | implicit                                            | All wines with scores of 100 and prices of \\\\$20 or less                                                             | `{'points': 100, 'price': {'$lte': 20}}`                                            |\n",
    "| Or                                                                    | `'$or': [{condition1}, {condition2}]`                 | All wines with scores of 100, or prices of \\\\$20 or less                                                             | `{'$or': [{'points': 100}, {'price: {'$lte': 20}}]}`                                  |\n",
    "| Exists in a set                                                       | `'$in': [value1, value2, ...]`                        | All wines from Virginia, Maryland, or North Carolina                                                               | `{'province': {'$in': ['Virginia', 'Maryland', 'North Carolina']}}`                 |\n",
    "| Not in a set                                                          | `'$nin'`                                              | All wines except those from Virginia, Maryland, and North Carolina                                                 | `{'province': {'$nin': ['Virginia', 'Maryland', 'North Carolina']}}`                |\n",
    "| Use logical conditions that compare two or more keys                  | `{'$expr': <expression>}`                             | All wines whose price is greater than their score                                                                  | `{'$expr': {'$gt': ['$price', '$points']}}`                                             |\n",
    "| Logical negation (only recommended for use with `$text` and `$regex`) | `'$not'`                                              | All wines whose descriptions do not contain the word \"chocolate\", treating capital and lower-case letters the same | `{'$not': {'description': {'$text': {'$search': 'chocolate', '$caseSensitive': false}}}}` |\n",
    "\n",
    "To quickly see the data that is output by queries that use these operators, I write a function that takes a JSON dictionary as an input, and outputs a `pandas` dataframe:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "def mongo_read_query(col, q):\n",
    "    qtext = dumps(col.find(q))\n",
    "    qrec = loads(qtext)\n",
    "    qdf = pd.DataFrame.from_records(qrec)\n",
    "    return qdf"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see all wines with a score of 100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcca25fcf746119e498</td>\n",
       "      <td>345</td>\n",
       "      <td>Australia</td>\n",
       "      <td>This wine contains some material over 100 year...</td>\n",
       "      <td>100</td>\n",
       "      <td>350.0</td>\n",
       "      <td>Victoria</td>\n",
       "      <td>Rutherglen</td>\n",
       "      <td>Joe Czerwinski</td>\n",
       "      <td>@JoeCz</td>\n",
       "      <td>Chambers Rosewood Vineyards NV Rare Muscat (Ru...</td>\n",
       "      <td>Muscat</td>\n",
       "      <td>Chambers Rosewood Vineyards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcea25fcf74611a5511</td>\n",
       "      <td>36528</td>\n",
       "      <td>France</td>\n",
       "      <td>This is a fabulous wine from the greatest Cham...</td>\n",
       "      <td>100</td>\n",
       "      <td>259.0</td>\n",
       "      <td>Champagne</td>\n",
       "      <td>Champagne</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Krug 2002 Brut  (Champagne)</td>\n",
       "      <td>Champagne Blend</td>\n",
       "      <td>Krug</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcea25fcf74611a66df</td>\n",
       "      <td>42197</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>This is the latest release of what has long be...</td>\n",
       "      <td>100</td>\n",
       "      <td>450.0</td>\n",
       "      <td>Douro</td>\n",
       "      <td>None</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Casa Ferreirinha 2008 Barca-Velha Red (Douro)</td>\n",
       "      <td>Portuguese Red</td>\n",
       "      <td>Casa Ferreirinha</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcea25fcf74611a7214</td>\n",
       "      <td>45781</td>\n",
       "      <td>Italy</td>\n",
       "      <td>This gorgeous, fragrant wine opens with classi...</td>\n",
       "      <td>100</td>\n",
       "      <td>550.0</td>\n",
       "      <td>Tuscany</td>\n",
       "      <td>Brunello di Montalcino</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Biondi Santi 2010 Riserva  (Brunello di Montal...</td>\n",
       "      <td>Sangiovese</td>\n",
       "      <td>Biondi Santi</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcea25fcf74611a9877</td>\n",
       "      <td>58352</td>\n",
       "      <td>France</td>\n",
       "      <td>This is a magnificently solid wine, initially ...</td>\n",
       "      <td>100</td>\n",
       "      <td>150.0</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>Saint-Julien</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Château Léoville Barton 2010  Saint-Julien</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Château Léoville Barton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5ed80dcfa25fcf74611afacd</td>\n",
       "      <td>89728</td>\n",
       "      <td>France</td>\n",
       "      <td>This latest incarnation of the famous brand is...</td>\n",
       "      <td>100</td>\n",
       "      <td>250.0</td>\n",
       "      <td>Champagne</td>\n",
       "      <td>Champagne</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Louis Roederer 2008 Cristal Vintage Brut  (Cha...</td>\n",
       "      <td>Champagne Blend</td>\n",
       "      <td>Louis Roederer</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5ed80dcfa25fcf74611aface</td>\n",
       "      <td>89729</td>\n",
       "      <td>France</td>\n",
       "      <td>This new release from a great vintage for Char...</td>\n",
       "      <td>100</td>\n",
       "      <td>617.0</td>\n",
       "      <td>Champagne</td>\n",
       "      <td>Champagne</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Salon 2006 Le Mesnil Blanc de Blancs Brut Char...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>Salon</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5ed80dcfa25fcf74611b3fc4</td>\n",
       "      <td>111753</td>\n",
       "      <td>France</td>\n",
       "      <td>Almost black in color, this stunning wine is g...</td>\n",
       "      <td>100</td>\n",
       "      <td>1500.0</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>Pauillac</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Château Lafite Rothschild 2010  Pauillac</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Château Lafite Rothschild</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>5ed80dcfa25fcf74611b3fc5</td>\n",
       "      <td>111755</td>\n",
       "      <td>France</td>\n",
       "      <td>This is the finest Cheval Blanc for many years...</td>\n",
       "      <td>100</td>\n",
       "      <td>1500.0</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>Saint-Émilion</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Château Cheval Blanc 2010  Saint-Émilion</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Château Cheval Blanc</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>5ed80dcfa25fcf74611b3fc6</td>\n",
       "      <td>111756</td>\n",
       "      <td>France</td>\n",
       "      <td>A hugely powerful wine, full of dark, brooding...</td>\n",
       "      <td>100</td>\n",
       "      <td>359.0</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>Saint-Julien</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Château Léoville Las Cases 2010  Saint-Julien</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Château Léoville Las Cases</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>5ed80dcfa25fcf74611b4648</td>\n",
       "      <td>113929</td>\n",
       "      <td>US</td>\n",
       "      <td>In 2005 Charles Smith introduced three high-en...</td>\n",
       "      <td>100</td>\n",
       "      <td>80.0</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Columbia Valley (WA)</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Charles Smith 2006 Royal City Syrah (Columbia ...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>Charles Smith</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>5ed80dcfa25fcf74611b499d</td>\n",
       "      <td>114972</td>\n",
       "      <td>Portugal</td>\n",
       "      <td>A powerful and ripe wine, strongly influenced ...</td>\n",
       "      <td>100</td>\n",
       "      <td>650.0</td>\n",
       "      <td>Port</td>\n",
       "      <td>None</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Quinta do Noval 2011 Nacional Vintage  (Port)</td>\n",
       "      <td>Port</td>\n",
       "      <td>Quinta do Noval</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>5ed80dcfa25fcf74611b6300</td>\n",
       "      <td>122935</td>\n",
       "      <td>France</td>\n",
       "      <td>Full of ripe fruit, opulent and concentrated, ...</td>\n",
       "      <td>100</td>\n",
       "      <td>848.0</td>\n",
       "      <td>Bordeaux</td>\n",
       "      <td>Pessac-Léognan</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Château Haut-Brion 2014  Pessac-Léognan</td>\n",
       "      <td>Bordeaux-style White Blend</td>\n",
       "      <td>Château Haut-Brion</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>5ed80dcfa25fcf74611b64d4</td>\n",
       "      <td>123545</td>\n",
       "      <td>US</td>\n",
       "      <td>Initially a rather subdued Frog; as if it has ...</td>\n",
       "      <td>100</td>\n",
       "      <td>80.0</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Walla Walla Valley (WA)</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Cayuse 2008 Bionic Frog Syrah (Walla Walla Val...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>Cayuse</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         _id  wine_id    country  \\\n",
       "0   5ed80dcca25fcf746119e498      345  Australia   \n",
       "1   5ed80dcea25fcf74611a5511    36528     France   \n",
       "2   5ed80dcea25fcf74611a66df    42197   Portugal   \n",
       "3   5ed80dcea25fcf74611a7214    45781      Italy   \n",
       "4   5ed80dcea25fcf74611a9877    58352     France   \n",
       "5   5ed80dcfa25fcf74611afacd    89728     France   \n",
       "6   5ed80dcfa25fcf74611aface    89729     France   \n",
       "7   5ed80dcfa25fcf74611b3fc4   111753     France   \n",
       "8   5ed80dcfa25fcf74611b3fc5   111755     France   \n",
       "9   5ed80dcfa25fcf74611b3fc6   111756     France   \n",
       "10  5ed80dcfa25fcf74611b4648   113929         US   \n",
       "11  5ed80dcfa25fcf74611b499d   114972   Portugal   \n",
       "12  5ed80dcfa25fcf74611b6300   122935     France   \n",
       "13  5ed80dcfa25fcf74611b64d4   123545         US   \n",
       "\n",
       "                                          description  points   price  \\\n",
       "0   This wine contains some material over 100 year...     100   350.0   \n",
       "1   This is a fabulous wine from the greatest Cham...     100   259.0   \n",
       "2   This is the latest release of what has long be...     100   450.0   \n",
       "3   This gorgeous, fragrant wine opens with classi...     100   550.0   \n",
       "4   This is a magnificently solid wine, initially ...     100   150.0   \n",
       "5   This latest incarnation of the famous brand is...     100   250.0   \n",
       "6   This new release from a great vintage for Char...     100   617.0   \n",
       "7   Almost black in color, this stunning wine is g...     100  1500.0   \n",
       "8   This is the finest Cheval Blanc for many years...     100  1500.0   \n",
       "9   A hugely powerful wine, full of dark, brooding...     100   359.0   \n",
       "10  In 2005 Charles Smith introduced three high-en...     100    80.0   \n",
       "11  A powerful and ripe wine, strongly influenced ...     100   650.0   \n",
       "12  Full of ripe fruit, opulent and concentrated, ...     100   848.0   \n",
       "13  Initially a rather subdued Frog; as if it has ...     100    80.0   \n",
       "\n",
       "      province                   region     taster_name taster_twitter_handle  \\\n",
       "0     Victoria               Rutherglen  Joe Czerwinski                @JoeCz   \n",
       "1    Champagne                Champagne      Roger Voss            @vossroger   \n",
       "2        Douro                     None      Roger Voss            @vossroger   \n",
       "3      Tuscany   Brunello di Montalcino   Kerin O’Keefe          @kerinokeefe   \n",
       "4     Bordeaux             Saint-Julien      Roger Voss            @vossroger   \n",
       "5    Champagne                Champagne      Roger Voss            @vossroger   \n",
       "6    Champagne                Champagne      Roger Voss            @vossroger   \n",
       "7     Bordeaux                 Pauillac      Roger Voss            @vossroger   \n",
       "8     Bordeaux            Saint-Émilion      Roger Voss            @vossroger   \n",
       "9     Bordeaux             Saint-Julien      Roger Voss            @vossroger   \n",
       "10  Washington     Columbia Valley (WA)    Paul Gregutt           @paulgwine    \n",
       "11        Port                     None      Roger Voss            @vossroger   \n",
       "12    Bordeaux           Pessac-Léognan      Roger Voss            @vossroger   \n",
       "13  Washington  Walla Walla Valley (WA)    Paul Gregutt           @paulgwine    \n",
       "\n",
       "                                                title  \\\n",
       "0   Chambers Rosewood Vineyards NV Rare Muscat (Ru...   \n",
       "1                         Krug 2002 Brut  (Champagne)   \n",
       "2       Casa Ferreirinha 2008 Barca-Velha Red (Douro)   \n",
       "3   Biondi Santi 2010 Riserva  (Brunello di Montal...   \n",
       "4          Château Léoville Barton 2010  Saint-Julien   \n",
       "5   Louis Roederer 2008 Cristal Vintage Brut  (Cha...   \n",
       "6   Salon 2006 Le Mesnil Blanc de Blancs Brut Char...   \n",
       "7            Château Lafite Rothschild 2010  Pauillac   \n",
       "8            Château Cheval Blanc 2010  Saint-Émilion   \n",
       "9       Château Léoville Las Cases 2010  Saint-Julien   \n",
       "10  Charles Smith 2006 Royal City Syrah (Columbia ...   \n",
       "11      Quinta do Noval 2011 Nacional Vintage  (Port)   \n",
       "12            Château Haut-Brion 2014  Pessac-Léognan   \n",
       "13  Cayuse 2008 Bionic Frog Syrah (Walla Walla Val...   \n",
       "\n",
       "                       variety                       winery  \n",
       "0                       Muscat  Chambers Rosewood Vineyards  \n",
       "1              Champagne Blend                         Krug  \n",
       "2               Portuguese Red             Casa Ferreirinha  \n",
       "3                   Sangiovese                 Biondi Santi  \n",
       "4     Bordeaux-style Red Blend      Château Léoville Barton  \n",
       "5              Champagne Blend               Louis Roederer  \n",
       "6                   Chardonnay                        Salon  \n",
       "7     Bordeaux-style Red Blend    Château Lafite Rothschild  \n",
       "8     Bordeaux-style Red Blend         Château Cheval Blanc  \n",
       "9     Bordeaux-style Red Blend   Château Léoville Las Cases  \n",
       "10                       Syrah                Charles Smith  \n",
       "11                        Port              Quinta do Noval  \n",
       "12  Bordeaux-style White Blend           Château Haut-Brion  \n",
       "13                       Syrah                       Cayuse  "
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = {'points': 100}\n",
    "mongo_read_query(winecollection, myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see wines with a score of 100 and a cost of less than \\\\$100, we can use the `$lt` operator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcfa25fcf74611b4648</td>\n",
       "      <td>113929</td>\n",
       "      <td>US</td>\n",
       "      <td>In 2005 Charles Smith introduced three high-en...</td>\n",
       "      <td>100</td>\n",
       "      <td>80.0</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Columbia Valley (WA)</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Charles Smith 2006 Royal City Syrah (Columbia ...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>Charles Smith</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcfa25fcf74611b64d4</td>\n",
       "      <td>123545</td>\n",
       "      <td>US</td>\n",
       "      <td>Initially a rather subdued Frog; as if it has ...</td>\n",
       "      <td>100</td>\n",
       "      <td>80.0</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Walla Walla Valley (WA)</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Cayuse 2008 Bionic Frog Syrah (Walla Walla Val...</td>\n",
       "      <td>Syrah</td>\n",
       "      <td>Cayuse</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id  wine_id country  \\\n",
       "0  5ed80dcfa25fcf74611b4648   113929      US   \n",
       "1  5ed80dcfa25fcf74611b64d4   123545      US   \n",
       "\n",
       "                                         description  points  price  \\\n",
       "0  In 2005 Charles Smith introduced three high-en...     100   80.0   \n",
       "1  Initially a rather subdued Frog; as if it has ...     100   80.0   \n",
       "\n",
       "     province                   region   taster_name taster_twitter_handle  \\\n",
       "0  Washington     Columbia Valley (WA)  Paul Gregutt           @paulgwine    \n",
       "1  Washington  Walla Walla Valley (WA)  Paul Gregutt           @paulgwine    \n",
       "\n",
       "                                               title variety         winery  \n",
       "0  Charles Smith 2006 Royal City Syrah (Columbia ...   Syrah  Charles Smith  \n",
       "1  Cayuse 2008 Bionic Frog Syrah (Walla Walla Val...   Syrah         Cayuse  "
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = {'points': 100, 'price': {'$lt': 100}}\n",
    "mongo_read_query(winecollection, myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To see wines that are from Ohio or North Carolina, we use the `$in` operator:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcea25fcf74611a0d3f</td>\n",
       "      <td>13402</td>\n",
       "      <td>US</td>\n",
       "      <td>Clove and pepper spice the dark red cherry aro...</td>\n",
       "      <td>86</td>\n",
       "      <td>17.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Raffaldini 2007 Montepulciano (Swan Creek)</td>\n",
       "      <td>Montepulciano</td>\n",
       "      <td>Raffaldini</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcea25fcf74611a167b</td>\n",
       "      <td>16270</td>\n",
       "      <td>US</td>\n",
       "      <td>The nose shows an aroma of blackberry that is ...</td>\n",
       "      <td>86</td>\n",
       "      <td>17.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Yadkin Valley</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Shadow Springs 2011 Cabernet Franc (Yadkin Val...</td>\n",
       "      <td>Cabernet Franc</td>\n",
       "      <td>Shadow Springs</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcea25fcf74611a20a2</td>\n",
       "      <td>19566</td>\n",
       "      <td>US</td>\n",
       "      <td>Fruits, flowers and spice should lead the nose...</td>\n",
       "      <td>80</td>\n",
       "      <td>15.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "      <td>@suskostrzewa</td>\n",
       "      <td>Hermes 2006 Estate Bottled Nebbiolo (Ohio)</td>\n",
       "      <td>Nebbiolo</td>\n",
       "      <td>Hermes</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcea25fcf74611a23ee</td>\n",
       "      <td>20592</td>\n",
       "      <td>US</td>\n",
       "      <td>Stewed blackberries and muddled cherries perva...</td>\n",
       "      <td>86</td>\n",
       "      <td>23.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Raffaldini 2012 Riserva Sangiovese (Swan Creek)</td>\n",
       "      <td>Sangiovese</td>\n",
       "      <td>Raffaldini</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcea25fcf74611a5fc4</td>\n",
       "      <td>39935</td>\n",
       "      <td>US</td>\n",
       "      <td>Friendly, appealing flavors fo pear, lychee, a...</td>\n",
       "      <td>84</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "      <td>@suskostrzewa</td>\n",
       "      <td>Debonné 2008 Reserve Riesling (Grand River Val...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>5ed80dcea25fcf74611a686c</td>\n",
       "      <td>42692</td>\n",
       "      <td>US</td>\n",
       "      <td>Friendly, appealing flavors fo pear, lychee, a...</td>\n",
       "      <td>84</td>\n",
       "      <td>12.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "      <td>@suskostrzewa</td>\n",
       "      <td>Debonné 2008 Reserve Riesling (Grand River Val...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>5ed80dcea25fcf74611a7033</td>\n",
       "      <td>45209</td>\n",
       "      <td>US</td>\n",
       "      <td>Freshly squeezed lemons, lime and pretty white...</td>\n",
       "      <td>84</td>\n",
       "      <td>16.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Raffaldini 2009 Pinot Grigio (Swan Creek)</td>\n",
       "      <td>Pinot Grigio</td>\n",
       "      <td>Raffaldini</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>5ed80dcea25fcf74611a74db</td>\n",
       "      <td>46670</td>\n",
       "      <td>US</td>\n",
       "      <td>Black fruit aromas show over toasted vanilla a...</td>\n",
       "      <td>85</td>\n",
       "      <td>29.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Raffaldini 2012 Riserva Montepulciano (Swan Cr...</td>\n",
       "      <td>Montepulciano</td>\n",
       "      <td>Raffaldini</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>5ed80dcea25fcf74611a7c95</td>\n",
       "      <td>49154</td>\n",
       "      <td>US</td>\n",
       "      <td>Lean and racy, with limes and tart green apple...</td>\n",
       "      <td>86</td>\n",
       "      <td>10.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Joe Czerwinski</td>\n",
       "      <td>@JoeCz</td>\n",
       "      <td>Shelton Vineyards 2002 Riesling (North Carolina)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>Shelton Vineyards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>5ed80dcea25fcf74611a8518</td>\n",
       "      <td>52078</td>\n",
       "      <td>US</td>\n",
       "      <td>Charred oak, green herbs and vanilla spice not...</td>\n",
       "      <td>82</td>\n",
       "      <td>18.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Yadkin Valley</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Divine Llama 2007 In a Heart Beat Red (Yadkin ...</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Divine Llama</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>5ed80dcea25fcf74611a8fe5</td>\n",
       "      <td>55687</td>\n",
       "      <td>US</td>\n",
       "      <td>Mostly Sangiovese with a small dose of Petit V...</td>\n",
       "      <td>84</td>\n",
       "      <td>17.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Raffaldini 2007 Riserva Sangiovese (Swan Creek)</td>\n",
       "      <td>Sangiovese</td>\n",
       "      <td>Raffaldini</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>5ed80dcea25fcf74611a8fe9</td>\n",
       "      <td>55693</td>\n",
       "      <td>US</td>\n",
       "      <td>Aromas of toasted oak, green leaves, vanilla a...</td>\n",
       "      <td>84</td>\n",
       "      <td>25.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>RayLen 2006 Eagle's Select Red Wine Red (North...</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>RayLen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>5ed80dcfa25fcf74611ac51f</td>\n",
       "      <td>72535</td>\n",
       "      <td>US</td>\n",
       "      <td>Who knew Ohio made such tasty Chardonnay? Brig...</td>\n",
       "      <td>87</td>\n",
       "      <td>11.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Debonné 2009 Chardonnay (Grand River Valley)</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>5ed80dcfa25fcf74611ae168</td>\n",
       "      <td>81619</td>\n",
       "      <td>US</td>\n",
       "      <td>Strawberry and raspberry Kool-Aid aromas are s...</td>\n",
       "      <td>85</td>\n",
       "      <td>21.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Laurel Gray 2012 Estate Grown Cabernet Franc (...</td>\n",
       "      <td>Cabernet Franc</td>\n",
       "      <td>Laurel Gray</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>14</th>\n",
       "      <td>5ed80dcfa25fcf74611af839</td>\n",
       "      <td>88841</td>\n",
       "      <td>US</td>\n",
       "      <td>This is a vibrant, energetic Chardonnay that s...</td>\n",
       "      <td>84</td>\n",
       "      <td>17.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "      <td>@suskostrzewa</td>\n",
       "      <td>Debonné 2007 Vintner's Selection Chardonnay (G...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>15</th>\n",
       "      <td>5ed80dcfa25fcf74611b08bb</td>\n",
       "      <td>94267</td>\n",
       "      <td>US</td>\n",
       "      <td>Although the nose offers darker notes of petro...</td>\n",
       "      <td>83</td>\n",
       "      <td>15.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Debonné 2008 Lot 807 Reserve Riesling (Grand R...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>16</th>\n",
       "      <td>5ed80dcfa25fcf74611b08be</td>\n",
       "      <td>94275</td>\n",
       "      <td>US</td>\n",
       "      <td>The nose on this bright red blend from Sanders...</td>\n",
       "      <td>83</td>\n",
       "      <td>18.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Yadkin Valley</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Sanders Ridge 2008 Big Woods Red (Yadkin Valley)</td>\n",
       "      <td>Bordeaux-style Red Blend</td>\n",
       "      <td>Sanders Ridge</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>17</th>\n",
       "      <td>5ed80dcfa25fcf74611b13ff</td>\n",
       "      <td>97836</td>\n",
       "      <td>US</td>\n",
       "      <td>Smoke wafts over pressed apple and lemon notes...</td>\n",
       "      <td>83</td>\n",
       "      <td>13.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>RayLen 2009 Riesling (North Carolina)</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>RayLen</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>18</th>\n",
       "      <td>5ed80dcfa25fcf74611b1c1f</td>\n",
       "      <td>100445</td>\n",
       "      <td>US</td>\n",
       "      <td>Fresh minerality and dancing floral notes make...</td>\n",
       "      <td>84</td>\n",
       "      <td>11.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "      <td>@suskostrzewa</td>\n",
       "      <td>Debonné 2006 Reserve Riesling (Grand River Val...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>19</th>\n",
       "      <td>5ed80dcfa25fcf74611b1c23</td>\n",
       "      <td>100452</td>\n",
       "      <td>US</td>\n",
       "      <td>A slightly floral but lively nose is followed ...</td>\n",
       "      <td>84</td>\n",
       "      <td>15.0</td>\n",
       "      <td>Ohio</td>\n",
       "      <td>Grand River Valley</td>\n",
       "      <td>Susan Kostrzewa</td>\n",
       "      <td>@suskostrzewa</td>\n",
       "      <td>Debonné 2006 Lot 707 Reserve Riesling (Grand R...</td>\n",
       "      <td>Riesling</td>\n",
       "      <td>Debonné</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>20</th>\n",
       "      <td>5ed80dcfa25fcf74611b29f1</td>\n",
       "      <td>104722</td>\n",
       "      <td>US</td>\n",
       "      <td>There are enticing hints of berries and cream ...</td>\n",
       "      <td>86</td>\n",
       "      <td>15.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>Biltmore Estate 2010 Reserve Chardonnay (North...</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>Biltmore Estate</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>21</th>\n",
       "      <td>5ed80dcfa25fcf74611b4fb6</td>\n",
       "      <td>116912</td>\n",
       "      <td>US</td>\n",
       "      <td>Black cherry aromas are dwarfed by notes of wi...</td>\n",
       "      <td>83</td>\n",
       "      <td>24.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Swan Creek</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Raffaldini 2012 Montepulciano (Swan Creek)</td>\n",
       "      <td>Montepulciano</td>\n",
       "      <td>Raffaldini</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>22</th>\n",
       "      <td>5ed80dcfa25fcf74611b55ee</td>\n",
       "      <td>118895</td>\n",
       "      <td>US</td>\n",
       "      <td>Bright red fruits achieve a decent amount of r...</td>\n",
       "      <td>84</td>\n",
       "      <td>18.0</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>North Carolina</td>\n",
       "      <td>Anna Lee C. Iijima</td>\n",
       "      <td>None</td>\n",
       "      <td>RayLen 2008 Category 5 Red Wine Red (North Car...</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>RayLen</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                         _id  wine_id country  \\\n",
       "0   5ed80dcea25fcf74611a0d3f    13402      US   \n",
       "1   5ed80dcea25fcf74611a167b    16270      US   \n",
       "2   5ed80dcea25fcf74611a20a2    19566      US   \n",
       "3   5ed80dcea25fcf74611a23ee    20592      US   \n",
       "4   5ed80dcea25fcf74611a5fc4    39935      US   \n",
       "5   5ed80dcea25fcf74611a686c    42692      US   \n",
       "6   5ed80dcea25fcf74611a7033    45209      US   \n",
       "7   5ed80dcea25fcf74611a74db    46670      US   \n",
       "8   5ed80dcea25fcf74611a7c95    49154      US   \n",
       "9   5ed80dcea25fcf74611a8518    52078      US   \n",
       "10  5ed80dcea25fcf74611a8fe5    55687      US   \n",
       "11  5ed80dcea25fcf74611a8fe9    55693      US   \n",
       "12  5ed80dcfa25fcf74611ac51f    72535      US   \n",
       "13  5ed80dcfa25fcf74611ae168    81619      US   \n",
       "14  5ed80dcfa25fcf74611af839    88841      US   \n",
       "15  5ed80dcfa25fcf74611b08bb    94267      US   \n",
       "16  5ed80dcfa25fcf74611b08be    94275      US   \n",
       "17  5ed80dcfa25fcf74611b13ff    97836      US   \n",
       "18  5ed80dcfa25fcf74611b1c1f   100445      US   \n",
       "19  5ed80dcfa25fcf74611b1c23   100452      US   \n",
       "20  5ed80dcfa25fcf74611b29f1   104722      US   \n",
       "21  5ed80dcfa25fcf74611b4fb6   116912      US   \n",
       "22  5ed80dcfa25fcf74611b55ee   118895      US   \n",
       "\n",
       "                                          description  points  price  \\\n",
       "0   Clove and pepper spice the dark red cherry aro...      86   17.0   \n",
       "1   The nose shows an aroma of blackberry that is ...      86   17.0   \n",
       "2   Fruits, flowers and spice should lead the nose...      80   15.0   \n",
       "3   Stewed blackberries and muddled cherries perva...      86   23.0   \n",
       "4   Friendly, appealing flavors fo pear, lychee, a...      84   12.0   \n",
       "5   Friendly, appealing flavors fo pear, lychee, a...      84   12.0   \n",
       "6   Freshly squeezed lemons, lime and pretty white...      84   16.0   \n",
       "7   Black fruit aromas show over toasted vanilla a...      85   29.0   \n",
       "8   Lean and racy, with limes and tart green apple...      86   10.0   \n",
       "9   Charred oak, green herbs and vanilla spice not...      82   18.0   \n",
       "10  Mostly Sangiovese with a small dose of Petit V...      84   17.0   \n",
       "11  Aromas of toasted oak, green leaves, vanilla a...      84   25.0   \n",
       "12  Who knew Ohio made such tasty Chardonnay? Brig...      87   11.0   \n",
       "13  Strawberry and raspberry Kool-Aid aromas are s...      85   21.0   \n",
       "14  This is a vibrant, energetic Chardonnay that s...      84   17.0   \n",
       "15  Although the nose offers darker notes of petro...      83   15.0   \n",
       "16  The nose on this bright red blend from Sanders...      83   18.0   \n",
       "17  Smoke wafts over pressed apple and lemon notes...      83   13.0   \n",
       "18  Fresh minerality and dancing floral notes make...      84   11.0   \n",
       "19  A slightly floral but lively nose is followed ...      84   15.0   \n",
       "20  There are enticing hints of berries and cream ...      86   15.0   \n",
       "21  Black cherry aromas are dwarfed by notes of wi...      83   24.0   \n",
       "22  Bright red fruits achieve a decent amount of r...      84   18.0   \n",
       "\n",
       "          province              region         taster_name  \\\n",
       "0   North Carolina          Swan Creek  Anna Lee C. Iijima   \n",
       "1   North Carolina       Yadkin Valley  Alexander Peartree   \n",
       "2             Ohio                Ohio     Susan Kostrzewa   \n",
       "3   North Carolina          Swan Creek  Alexander Peartree   \n",
       "4             Ohio  Grand River Valley     Susan Kostrzewa   \n",
       "5             Ohio  Grand River Valley     Susan Kostrzewa   \n",
       "6   North Carolina          Swan Creek  Anna Lee C. Iijima   \n",
       "7   North Carolina          Swan Creek  Alexander Peartree   \n",
       "8   North Carolina      North Carolina      Joe Czerwinski   \n",
       "9   North Carolina       Yadkin Valley  Anna Lee C. Iijima   \n",
       "10  North Carolina          Swan Creek  Anna Lee C. Iijima   \n",
       "11  North Carolina      North Carolina  Anna Lee C. Iijima   \n",
       "12            Ohio  Grand River Valley  Anna Lee C. Iijima   \n",
       "13  North Carolina          Swan Creek  Alexander Peartree   \n",
       "14            Ohio  Grand River Valley     Susan Kostrzewa   \n",
       "15            Ohio  Grand River Valley  Anna Lee C. Iijima   \n",
       "16  North Carolina       Yadkin Valley  Anna Lee C. Iijima   \n",
       "17  North Carolina      North Carolina  Anna Lee C. Iijima   \n",
       "18            Ohio  Grand River Valley     Susan Kostrzewa   \n",
       "19            Ohio  Grand River Valley     Susan Kostrzewa   \n",
       "20  North Carolina      North Carolina  Anna Lee C. Iijima   \n",
       "21  North Carolina          Swan Creek  Alexander Peartree   \n",
       "22  North Carolina      North Carolina  Anna Lee C. Iijima   \n",
       "\n",
       "   taster_twitter_handle                                              title  \\\n",
       "0                   None         Raffaldini 2007 Montepulciano (Swan Creek)   \n",
       "1                   None  Shadow Springs 2011 Cabernet Franc (Yadkin Val...   \n",
       "2          @suskostrzewa         Hermes 2006 Estate Bottled Nebbiolo (Ohio)   \n",
       "3                   None    Raffaldini 2012 Riserva Sangiovese (Swan Creek)   \n",
       "4          @suskostrzewa  Debonné 2008 Reserve Riesling (Grand River Val...   \n",
       "5          @suskostrzewa  Debonné 2008 Reserve Riesling (Grand River Val...   \n",
       "6                   None          Raffaldini 2009 Pinot Grigio (Swan Creek)   \n",
       "7                   None  Raffaldini 2012 Riserva Montepulciano (Swan Cr...   \n",
       "8                 @JoeCz   Shelton Vineyards 2002 Riesling (North Carolina)   \n",
       "9                   None  Divine Llama 2007 In a Heart Beat Red (Yadkin ...   \n",
       "10                  None    Raffaldini 2007 Riserva Sangiovese (Swan Creek)   \n",
       "11                  None  RayLen 2006 Eagle's Select Red Wine Red (North...   \n",
       "12                  None       Debonné 2009 Chardonnay (Grand River Valley)   \n",
       "13                  None  Laurel Gray 2012 Estate Grown Cabernet Franc (...   \n",
       "14         @suskostrzewa  Debonné 2007 Vintner's Selection Chardonnay (G...   \n",
       "15                  None  Debonné 2008 Lot 807 Reserve Riesling (Grand R...   \n",
       "16                  None   Sanders Ridge 2008 Big Woods Red (Yadkin Valley)   \n",
       "17                  None              RayLen 2009 Riesling (North Carolina)   \n",
       "18         @suskostrzewa  Debonné 2006 Reserve Riesling (Grand River Val...   \n",
       "19         @suskostrzewa  Debonné 2006 Lot 707 Reserve Riesling (Grand R...   \n",
       "20                  None  Biltmore Estate 2010 Reserve Chardonnay (North...   \n",
       "21                  None         Raffaldini 2012 Montepulciano (Swan Creek)   \n",
       "22                  None  RayLen 2008 Category 5 Red Wine Red (North Car...   \n",
       "\n",
       "                     variety             winery  \n",
       "0              Montepulciano         Raffaldini  \n",
       "1             Cabernet Franc     Shadow Springs  \n",
       "2                   Nebbiolo             Hermes  \n",
       "3                 Sangiovese         Raffaldini  \n",
       "4                   Riesling            Debonné  \n",
       "5                   Riesling            Debonné  \n",
       "6               Pinot Grigio         Raffaldini  \n",
       "7              Montepulciano         Raffaldini  \n",
       "8                   Riesling  Shelton Vineyards  \n",
       "9                   R. Blend       Divine Llama  \n",
       "10                Sangiovese         Raffaldini  \n",
       "11  Bordeaux-style Red Blend             RayLen  \n",
       "12                Chardonnay            Debonné  \n",
       "13            Cabernet Franc        Laurel Gray  \n",
       "14                Chardonnay            Debonné  \n",
       "15                  Riesling            Debonné  \n",
       "16  Bordeaux-style Red Blend      Sanders Ridge  \n",
       "17                  Riesling             RayLen  \n",
       "18                  Riesling            Debonné  \n",
       "19                  Riesling            Debonné  \n",
       "20                Chardonnay    Biltmore Estate  \n",
       "21             Montepulciano         Raffaldini  \n",
       "22                  R. Blend             RayLen  "
      ]
     },
     "execution_count": 77,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = {'province': {'$in': ['Ohio','North Carolina']}}\n",
    "mongo_read_query(winecollection, myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To illustrate the \"or\" operator, we can query all wines that are either from Virginia, or have a score of 100:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcca25fcf746119e3bd</td>\n",
       "      <td>19</td>\n",
       "      <td>US</td>\n",
       "      <td>Red fruit aromas pervade on the nose, with cig...</td>\n",
       "      <td>87</td>\n",
       "      <td>32.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Quiévremont 2012 Meritage (Virginia)</td>\n",
       "      <td>Meritage</td>\n",
       "      <td>Quiévremont</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcca25fcf746119e3be</td>\n",
       "      <td>20</td>\n",
       "      <td>US</td>\n",
       "      <td>Ripe aromas of dark berries mingle with ample ...</td>\n",
       "      <td>87</td>\n",
       "      <td>23.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Quiévremont 2012 Vin de Maison Red (Virginia)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Quiévremont</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcca25fcf746119e498</td>\n",
       "      <td>345</td>\n",
       "      <td>Australia</td>\n",
       "      <td>This wine contains some material over 100 year...</td>\n",
       "      <td>100</td>\n",
       "      <td>350.0</td>\n",
       "      <td>Victoria</td>\n",
       "      <td>Rutherglen</td>\n",
       "      <td>Joe Czerwinski</td>\n",
       "      <td>@JoeCz</td>\n",
       "      <td>Chambers Rosewood Vineyards NV Rare Muscat (Ru...</td>\n",
       "      <td>Muscat</td>\n",
       "      <td>Chambers Rosewood Vineyards</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcea25fcf746119e8d0</td>\n",
       "      <td>1625</td>\n",
       "      <td>US</td>\n",
       "      <td>Popping with aromas of lychee, rose, geranium ...</td>\n",
       "      <td>85</td>\n",
       "      <td>16.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Carrie Dykes</td>\n",
       "      <td>None</td>\n",
       "      <td>The Williamsburg Winery 2015 A Midsummer Night...</td>\n",
       "      <td>White Blend</td>\n",
       "      <td>The Williamsburg Winery</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcea25fcf746119e8d6</td>\n",
       "      <td>1631</td>\n",
       "      <td>US</td>\n",
       "      <td>Powerful aromas of lychee, mango and peach giv...</td>\n",
       "      <td>85</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Middleburg</td>\n",
       "      <td>Carrie Dykes</td>\n",
       "      <td>None</td>\n",
       "      <td>Blue Valley 2015 Muskat Ottonel (Middleburg)</td>\n",
       "      <td>Muskat Ottonel</td>\n",
       "      <td>Blue Valley</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>388</th>\n",
       "      <td>5ed80dcfa25fcf74611b71b3</td>\n",
       "      <td>127736</td>\n",
       "      <td>US</td>\n",
       "      <td>Dense with alluring aromas, this wine is full ...</td>\n",
       "      <td>88</td>\n",
       "      <td>30.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Carrie Dykes</td>\n",
       "      <td>None</td>\n",
       "      <td>Early Mountain 2015 Elevation Red (Virginia)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Early Mountain</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>389</th>\n",
       "      <td>5ed80dcfa25fcf74611b71bd</td>\n",
       "      <td>127746</td>\n",
       "      <td>US</td>\n",
       "      <td>A grape known in Uruguay and Madiran has prove...</td>\n",
       "      <td>88</td>\n",
       "      <td>25.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Carrie Dykes</td>\n",
       "      <td>None</td>\n",
       "      <td>Horton 2014 Tannat (Virginia)</td>\n",
       "      <td>Tannat</td>\n",
       "      <td>Horton</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>390</th>\n",
       "      <td>5ed80dcfa25fcf74611b7447</td>\n",
       "      <td>128576</td>\n",
       "      <td>US</td>\n",
       "      <td>Peach and steely lemon aromas carry to a citru...</td>\n",
       "      <td>87</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Monticello</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Pollak 2012 Reserve Chardonnay (Monticello)</td>\n",
       "      <td>Chardonnay</td>\n",
       "      <td>Pollak</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>391</th>\n",
       "      <td>5ed80dcfa25fcf74611b7708</td>\n",
       "      <td>129422</td>\n",
       "      <td>US</td>\n",
       "      <td>The nose of this wine is bursting with raspber...</td>\n",
       "      <td>89</td>\n",
       "      <td>32.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Monticello</td>\n",
       "      <td>Carrie Dykes</td>\n",
       "      <td>None</td>\n",
       "      <td>Stinson 2014 Meritage (Monticello)</td>\n",
       "      <td>Meritage</td>\n",
       "      <td>Stinson</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>392</th>\n",
       "      <td>5ed80dcfa25fcf74611b772a</td>\n",
       "      <td>129459</td>\n",
       "      <td>US</td>\n",
       "      <td>Somehow, winemaker Luca Paschina manages to ma...</td>\n",
       "      <td>87</td>\n",
       "      <td>23.0</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Carrie Dykes</td>\n",
       "      <td>None</td>\n",
       "      <td>Barboursville Vineyards 2015 Reserve Vermentin...</td>\n",
       "      <td>Vermentino</td>\n",
       "      <td>Barboursville Vineyards</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>393 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                          _id  wine_id    country  \\\n",
       "0    5ed80dcca25fcf746119e3bd       19         US   \n",
       "1    5ed80dcca25fcf746119e3be       20         US   \n",
       "2    5ed80dcca25fcf746119e498      345  Australia   \n",
       "3    5ed80dcea25fcf746119e8d0     1625         US   \n",
       "4    5ed80dcea25fcf746119e8d6     1631         US   \n",
       "..                        ...      ...        ...   \n",
       "388  5ed80dcfa25fcf74611b71b3   127736         US   \n",
       "389  5ed80dcfa25fcf74611b71bd   127746         US   \n",
       "390  5ed80dcfa25fcf74611b7447   128576         US   \n",
       "391  5ed80dcfa25fcf74611b7708   129422         US   \n",
       "392  5ed80dcfa25fcf74611b772a   129459         US   \n",
       "\n",
       "                                           description  points  price  \\\n",
       "0    Red fruit aromas pervade on the nose, with cig...      87   32.0   \n",
       "1    Ripe aromas of dark berries mingle with ample ...      87   23.0   \n",
       "2    This wine contains some material over 100 year...     100  350.0   \n",
       "3    Popping with aromas of lychee, rose, geranium ...      85   16.0   \n",
       "4    Powerful aromas of lychee, mango and peach giv...      85   22.0   \n",
       "..                                                 ...     ...    ...   \n",
       "388  Dense with alluring aromas, this wine is full ...      88   30.0   \n",
       "389  A grape known in Uruguay and Madiran has prove...      88   25.0   \n",
       "390  Peach and steely lemon aromas carry to a citru...      87   28.0   \n",
       "391  The nose of this wine is bursting with raspber...      89   32.0   \n",
       "392  Somehow, winemaker Luca Paschina manages to ma...      87   23.0   \n",
       "\n",
       "     province      region         taster_name taster_twitter_handle  \\\n",
       "0    Virginia    Virginia  Alexander Peartree                  None   \n",
       "1    Virginia    Virginia  Alexander Peartree                  None   \n",
       "2    Victoria  Rutherglen      Joe Czerwinski                @JoeCz   \n",
       "3    Virginia    Virginia        Carrie Dykes                  None   \n",
       "4    Virginia  Middleburg        Carrie Dykes                  None   \n",
       "..        ...         ...                 ...                   ...   \n",
       "388  Virginia    Virginia        Carrie Dykes                  None   \n",
       "389  Virginia    Virginia        Carrie Dykes                  None   \n",
       "390  Virginia  Monticello  Alexander Peartree                  None   \n",
       "391  Virginia  Monticello        Carrie Dykes                  None   \n",
       "392  Virginia    Virginia        Carrie Dykes                  None   \n",
       "\n",
       "                                                 title         variety  \\\n",
       "0                 Quiévremont 2012 Meritage (Virginia)        Meritage   \n",
       "1        Quiévremont 2012 Vin de Maison Red (Virginia)        R. Blend   \n",
       "2    Chambers Rosewood Vineyards NV Rare Muscat (Ru...          Muscat   \n",
       "3    The Williamsburg Winery 2015 A Midsummer Night...     White Blend   \n",
       "4         Blue Valley 2015 Muskat Ottonel (Middleburg)  Muskat Ottonel   \n",
       "..                                                 ...             ...   \n",
       "388       Early Mountain 2015 Elevation Red (Virginia)        R. Blend   \n",
       "389                      Horton 2014 Tannat (Virginia)          Tannat   \n",
       "390        Pollak 2012 Reserve Chardonnay (Monticello)      Chardonnay   \n",
       "391                 Stinson 2014 Meritage (Monticello)        Meritage   \n",
       "392  Barboursville Vineyards 2015 Reserve Vermentin...      Vermentino   \n",
       "\n",
       "                          winery  \n",
       "0                    Quiévremont  \n",
       "1                    Quiévremont  \n",
       "2    Chambers Rosewood Vineyards  \n",
       "3        The Williamsburg Winery  \n",
       "4                    Blue Valley  \n",
       "..                           ...  \n",
       "388               Early Mountain  \n",
       "389                       Horton  \n",
       "390                       Pollak  \n",
       "391                      Stinson  \n",
       "392      Barboursville Vineyards  \n",
       "\n",
       "[393 rows x 13 columns]"
      ]
     },
     "execution_count": 78,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = {'$or': [{'points': 100}, {'province': 'Virginia'}]}\n",
    "mongo_read_query(winecollection, myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`$expr` requires an operator that compares two keys, and creates a sentence like \"X is greater than or equal to Y\". Next it requires a list that specifies what X in the sentence should be, then what Y should be. To search for all wines in which the price is greater than the score we use the `$expr` operator as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "      <th>location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcca25fcf746119e3d4</td>\n",
       "      <td>60.0</td>\n",
       "      <td>US</td>\n",
       "      <td>Syrupy and dense, this wine is jammy in plum a...</td>\n",
       "      <td>86.0</td>\n",
       "      <td>100</td>\n",
       "      <td>California</td>\n",
       "      <td>Napa Valley</td>\n",
       "      <td>Virginie Boone</td>\n",
       "      <td>@vboone</td>\n",
       "      <td>Okapi 2013 Estate Cabernet Sauvignon (Napa Val...</td>\n",
       "      <td>Cabernet Sauvignon</td>\n",
       "      <td>Okapi</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcca25fcf746119e42c</td>\n",
       "      <td>168.0</td>\n",
       "      <td>US</td>\n",
       "      <td>A fairly elegant expression of the variety, th...</td>\n",
       "      <td>91.0</td>\n",
       "      <td>95</td>\n",
       "      <td>California</td>\n",
       "      <td>Napa Valley</td>\n",
       "      <td>Virginie Boone</td>\n",
       "      <td>@vboone</td>\n",
       "      <td>Duckhorn 2012 Rector Creek Vineyard Merlot (Na...</td>\n",
       "      <td>Merlot</td>\n",
       "      <td>Duckhorn</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcca25fcf746119e476</td>\n",
       "      <td>284.0</td>\n",
       "      <td>Argentina</td>\n",
       "      <td>This huge Malbec defines jammy and concentrate...</td>\n",
       "      <td>92.0</td>\n",
       "      <td>215</td>\n",
       "      <td>Mendoza Province</td>\n",
       "      <td>Perdriel</td>\n",
       "      <td>Michael Schachner</td>\n",
       "      <td>@wineschach</td>\n",
       "      <td>Viña Cobos 2011 Marchiori Vineyard Block C2 Ma...</td>\n",
       "      <td>Malbec</td>\n",
       "      <td>Viña Cobos</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcca25fcf746119e498</td>\n",
       "      <td>345.0</td>\n",
       "      <td>Australia</td>\n",
       "      <td>This wine contains some material over 100 year...</td>\n",
       "      <td>100.0</td>\n",
       "      <td>350</td>\n",
       "      <td>Victoria</td>\n",
       "      <td>Rutherglen</td>\n",
       "      <td>Joe Czerwinski</td>\n",
       "      <td>@JoeCz</td>\n",
       "      <td>Chambers Rosewood Vineyards NV Rare Muscat (Ru...</td>\n",
       "      <td>Muscat</td>\n",
       "      <td>Chambers Rosewood Vineyards</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcca25fcf746119e499</td>\n",
       "      <td>346.0</td>\n",
       "      <td>Australia</td>\n",
       "      <td>This deep brown wine smells like a damp, mossy...</td>\n",
       "      <td>98.0</td>\n",
       "      <td>350</td>\n",
       "      <td>Victoria</td>\n",
       "      <td>Rutherglen</td>\n",
       "      <td>Joe Czerwinski</td>\n",
       "      <td>@JoeCz</td>\n",
       "      <td>Chambers Rosewood Vineyards NV Rare Muscadelle...</td>\n",
       "      <td>Muscadelle</td>\n",
       "      <td>Chambers Rosewood Vineyards</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3697</th>\n",
       "      <td>5ed80dcfa25fcf74611b78a8</td>\n",
       "      <td>129919.0</td>\n",
       "      <td>US</td>\n",
       "      <td>This ripe, rich, almost decadently thick wine ...</td>\n",
       "      <td>91.0</td>\n",
       "      <td>105</td>\n",
       "      <td>Washington</td>\n",
       "      <td>Walla Walla Valley (WA)</td>\n",
       "      <td>Paul Gregutt</td>\n",
       "      <td>@paulgwine</td>\n",
       "      <td>Nicholas Cole Cellars 2004 Reserve Red (Walla ...</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Nicholas Cole Cellars</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3698</th>\n",
       "      <td>5ed80dcfa25fcf74611b78b2</td>\n",
       "      <td>129931.0</td>\n",
       "      <td>France</td>\n",
       "      <td>A powerful, chunky wine, packed with solid tan...</td>\n",
       "      <td>91.0</td>\n",
       "      <td>107</td>\n",
       "      <td>Burgundy</td>\n",
       "      <td>Grands-Echezeaux</td>\n",
       "      <td>Roger Voss</td>\n",
       "      <td>@vossroger</td>\n",
       "      <td>Henri de Villamont 2005  Grands-Echezeaux</td>\n",
       "      <td>Pinot Noir</td>\n",
       "      <td>Henri de Villamont</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3699</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This wine goes great with dinner just like Dwy...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>35</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3700</th>\n",
       "      <td>5edd56e6b4e58ce3841e5deb</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This wine will make you speak differently. May...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.99</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>Anta Banderas A 10 2008</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Ribera del Duoro', 'region_2': N...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3701</th>\n",
       "      <td>5edd56e6b4e58ce3841e5dec</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Someone drank my entire bottle of wine!</td>\n",
       "      <td>NaN</td>\n",
       "      <td>14.99</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>Barrymore Rose 2013</td>\n",
       "      <td>Rose</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Monterey', 'region_2': None, 'pr...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>3702 rows × 14 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                           _id   wine_id    country  \\\n",
       "0     5ed80dcca25fcf746119e3d4      60.0         US   \n",
       "1     5ed80dcca25fcf746119e42c     168.0         US   \n",
       "2     5ed80dcca25fcf746119e476     284.0  Argentina   \n",
       "3     5ed80dcca25fcf746119e498     345.0  Australia   \n",
       "4     5ed80dcca25fcf746119e499     346.0  Australia   \n",
       "...                        ...       ...        ...   \n",
       "3697  5ed80dcfa25fcf74611b78a8  129919.0         US   \n",
       "3698  5ed80dcfa25fcf74611b78b2  129931.0     France   \n",
       "3699  5edd56e5b4e58ce3841e5dea       NaN        NaN   \n",
       "3700  5edd56e6b4e58ce3841e5deb       NaN        NaN   \n",
       "3701  5edd56e6b4e58ce3841e5dec       NaN        NaN   \n",
       "\n",
       "                                            description  points  price  \\\n",
       "0     Syrupy and dense, this wine is jammy in plum a...    86.0    100   \n",
       "1     A fairly elegant expression of the variety, th...    91.0     95   \n",
       "2     This huge Malbec defines jammy and concentrate...    92.0    215   \n",
       "3     This wine contains some material over 100 year...   100.0    350   \n",
       "4     This deep brown wine smells like a damp, mossy...    98.0    350   \n",
       "...                                                 ...     ...    ...   \n",
       "3697  This ripe, rich, almost decadently thick wine ...    91.0    105   \n",
       "3698  A powerful, chunky wine, packed with solid tan...    91.0    107   \n",
       "3699  This wine goes great with dinner just like Dwy...     NaN     35   \n",
       "3700  This wine will make you speak differently. May...     NaN  40.99   \n",
       "3701            Someone drank my entire bottle of wine!     NaN  14.99   \n",
       "\n",
       "              province                   region        taster_name  \\\n",
       "0           California              Napa Valley     Virginie Boone   \n",
       "1           California              Napa Valley     Virginie Boone   \n",
       "2     Mendoza Province                 Perdriel  Michael Schachner   \n",
       "3             Victoria               Rutherglen     Joe Czerwinski   \n",
       "4             Victoria               Rutherglen     Joe Czerwinski   \n",
       "...                ...                      ...                ...   \n",
       "3697        Washington  Walla Walla Valley (WA)       Paul Gregutt   \n",
       "3698          Burgundy         Grands-Echezeaux         Roger Voss   \n",
       "3699               NaN                      NaN    Jonathan Kropko   \n",
       "3700               NaN                      NaN    Jonathan Kropko   \n",
       "3701               NaN                      NaN    Jonathan Kropko   \n",
       "\n",
       "     taster_twitter_handle                                              title  \\\n",
       "0                  @vboone  Okapi 2013 Estate Cabernet Sauvignon (Napa Val...   \n",
       "1                  @vboone  Duckhorn 2012 Rector Creek Vineyard Merlot (Na...   \n",
       "2              @wineschach  Viña Cobos 2011 Marchiori Vineyard Block C2 Ma...   \n",
       "3                   @JoeCz  Chambers Rosewood Vineyards NV Rare Muscat (Ru...   \n",
       "4                   @JoeCz  Chambers Rosewood Vineyards NV Rare Muscadelle...   \n",
       "...                    ...                                                ...   \n",
       "3697           @paulgwine   Nicholas Cole Cellars 2004 Reserve Red (Walla ...   \n",
       "3698            @vossroger          Henri de Villamont 2005  Grands-Echezeaux   \n",
       "3699              @jmk5131           2016 Napa Valley Three By Wade Red Blend   \n",
       "3700              @jmk5131                            Anta Banderas A 10 2008   \n",
       "3701              @jmk5131                                Barrymore Rose 2013   \n",
       "\n",
       "                 variety                       winery  \\\n",
       "0     Cabernet Sauvignon                        Okapi   \n",
       "1                 Merlot                     Duckhorn   \n",
       "2                 Malbec                   Viña Cobos   \n",
       "3                 Muscat  Chambers Rosewood Vineyards   \n",
       "4             Muscadelle  Chambers Rosewood Vineyards   \n",
       "...                  ...                          ...   \n",
       "3697            R. Blend        Nicholas Cole Cellars   \n",
       "3698          Pinot Noir           Henri de Villamont   \n",
       "3699           Red Blend                          NaN   \n",
       "3700           Red Blend                          NaN   \n",
       "3701                Rose                          NaN   \n",
       "\n",
       "                                               location  \n",
       "0                                                   NaN  \n",
       "1                                                   NaN  \n",
       "2                                                   NaN  \n",
       "3                                                   NaN  \n",
       "4                                                   NaN  \n",
       "...                                                 ...  \n",
       "3697                                                NaN  \n",
       "3698                                                NaN  \n",
       "3699  {'region_1': 'Napa Valley', 'region_2': None, ...  \n",
       "3700  {'region_1': 'Ribera del Duoro', 'region_2': N...  \n",
       "3701  {'region_1': 'Monterey', 'region_2': None, 'pr...  \n",
       "\n",
       "[3702 rows x 14 columns]"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = {'$expr': {'$gt': ['$price', '$points']}}\n",
    "mongo_read_query(winecollection, myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the previous section, we entered a record for a wine released by former NBA all-star Dwyane Wade, and we purposely included a nested structure in this JSON record. The `location` key has subkeys `region_1`, `region_2`, `province`, `country`, and `winery`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'title': '2016 Napa Valley Three By Wade Red Blend',\n",
       " 'description': 'This wine goes great with dinner just like Dwyane Wade goes great with LeBron James or Shaq.',\n",
       " 'taster_name': 'Jonathan Kropko',\n",
       " 'taster_twitter_handle': '@jmk5131',\n",
       " 'price': '35',\n",
       " 'variety': 'Red Blend',\n",
       " 'location': {'region_1': 'Napa Valley',\n",
       "  'region_2': None,\n",
       "  'province': 'California',\n",
       "  'country': 'U.S.',\n",
       "  'winery': 'D Wade Cellars'},\n",
       " '_id': ObjectId('5edd56e5b4e58ce3841e5dea')}"
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dwadewine"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To query a subrecord, use dot notation of the form `'key.subkey'` to identify the path to the value you need. To query for the winery name \"D Wade Cellars\", we can type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>title</th>\n",
       "      <th>description</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>price</th>\n",
       "      <th>variety</th>\n",
       "      <th>location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>This wine goes great with dinner just like Dwy...</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>35</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id                                     title  \\\n",
       "0  5edd56e5b4e58ce3841e5dea  2016 Napa Valley Three By Wade Red Blend   \n",
       "\n",
       "                                         description      taster_name  \\\n",
       "0  This wine goes great with dinner just like Dwy...  Jonathan Kropko   \n",
       "\n",
       "  taster_twitter_handle price    variety  \\\n",
       "0              @jmk5131    35  Red Blend   \n",
       "\n",
       "                                            location  \n",
       "0  {'region_1': 'Napa Valley', 'region_2': None, ...  "
      ]
     },
     "execution_count": 81,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "myquery = {'location.winery': 'D Wade Cellars'}\n",
    "mongo_read_query(winecollection, myquery)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Selecting Features\n",
    "A read query in MongoDB will return the entire JSON dictionary for every record that matches the query. Sometimes, however, the entirety of the data for one record will be more information than we can feasibly work with. In some situations there might be an unmanagable number of features contained within each dictionary, and we only want to use a couple of these features. In other situations a feature might contain values that are so large that we want to avoid dealing with this feature if possible.\n",
    "\n",
    "To extract only a selection of the features, add a second JSON clause to the `.find()` method. The general syntax for selecting features is\n",
    "```\n",
    "db.collection.find({query}, {'feature'=1}}\n",
    "```\n",
    "where `{query}` is code, as described above, for extracting a selection of the records, and `{'feature'=1}` instructs MongoDB to include only the field named `feature` in the output. Alternatively, it is possible to list as many keys in this second clause as we want, so `{'feature1'=1, 'feature2'=1}` extracts `feature1` and `feature2`. In addition, setting the key equal to 0 instead of 1 instructs MongoDB to extract all features *except* the one specified with `'feature'=0`.\n",
    "\n",
    "In the wine collection, we can extract only the titles of Merlot wines with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcca25fcf746119e3c1</td>\n",
       "      <td>Bianchi 2011 Signature Selection Merlot (Paso ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcca25fcf746119e3cd</td>\n",
       "      <td>Sundance 2011 Merlot (Maule Valley)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcca25fcf746119e3ef</td>\n",
       "      <td>Passaggio 2014 Blau Vineyards Merlot (Knights ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcca25fcf746119e42c</td>\n",
       "      <td>Duckhorn 2012 Rector Creek Vineyard Merlot (Na...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcca25fcf746119e437</td>\n",
       "      <td>Viña Bisquertt 2007 Casa La Joya Reserve Merlo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2094</th>\n",
       "      <td>5ed80dcfa25fcf74611b77f0</td>\n",
       "      <td>Castillo de Monjardin 2009 Deyo Merlot (Navarra)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2095</th>\n",
       "      <td>5ed80dcfa25fcf74611b7862</td>\n",
       "      <td>Bonair 2006 Chateau Puryear Vineyard Merlot (R...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2096</th>\n",
       "      <td>5ed80dcfa25fcf74611b7865</td>\n",
       "      <td>Hyatt 2005 Merlot (Rattlesnake Hills)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2097</th>\n",
       "      <td>5ed80dcfa25fcf74611b786b</td>\n",
       "      <td>Ca' Momi 2013 Reserve Merlot (Carneros)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2098</th>\n",
       "      <td>5ed80dcfa25fcf74611b7896</td>\n",
       "      <td>Psagot 2014 Merlot</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2099 rows × 2 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                           _id  \\\n",
       "0     5ed80dcca25fcf746119e3c1   \n",
       "1     5ed80dcca25fcf746119e3cd   \n",
       "2     5ed80dcca25fcf746119e3ef   \n",
       "3     5ed80dcca25fcf746119e42c   \n",
       "4     5ed80dcca25fcf746119e437   \n",
       "...                        ...   \n",
       "2094  5ed80dcfa25fcf74611b77f0   \n",
       "2095  5ed80dcfa25fcf74611b7862   \n",
       "2096  5ed80dcfa25fcf74611b7865   \n",
       "2097  5ed80dcfa25fcf74611b786b   \n",
       "2098  5ed80dcfa25fcf74611b7896   \n",
       "\n",
       "                                                  title  \n",
       "0     Bianchi 2011 Signature Selection Merlot (Paso ...  \n",
       "1                   Sundance 2011 Merlot (Maule Valley)  \n",
       "2     Passaggio 2014 Blau Vineyards Merlot (Knights ...  \n",
       "3     Duckhorn 2012 Rector Creek Vineyard Merlot (Na...  \n",
       "4     Viña Bisquertt 2007 Casa La Joya Reserve Merlo...  \n",
       "...                                                 ...  \n",
       "2094   Castillo de Monjardin 2009 Deyo Merlot (Navarra)  \n",
       "2095  Bonair 2006 Chateau Puryear Vineyard Merlot (R...  \n",
       "2096              Hyatt 2005 Merlot (Rattlesnake Hills)  \n",
       "2097            Ca' Momi 2013 Reserve Merlot (Carneros)  \n",
       "2098                                 Psagot 2014 Merlot  \n",
       "\n",
       "[2099 rows x 2 columns]"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cursor = winecollection.find({'variety': 'Merlot'}, {'title': 1})\n",
    "qtext = dumps(cursor)\n",
    "qrec = loads(qtext)\n",
    "pd.DataFrame.from_records(qrec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "By default, the only field that is extracted other than the ones we directly specify is `_id`, but we can exclude `_id` as well by typing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>title</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Bianchi 2011 Signature Selection Merlot (Paso ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Sundance 2011 Merlot (Maule Valley)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Passaggio 2014 Blau Vineyards Merlot (Knights ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Duckhorn 2012 Rector Creek Vineyard Merlot (Na...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>Viña Bisquertt 2007 Casa La Joya Reserve Merlo...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2094</th>\n",
       "      <td>Castillo de Monjardin 2009 Deyo Merlot (Navarra)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2095</th>\n",
       "      <td>Bonair 2006 Chateau Puryear Vineyard Merlot (R...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2096</th>\n",
       "      <td>Hyatt 2005 Merlot (Rattlesnake Hills)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2097</th>\n",
       "      <td>Ca' Momi 2013 Reserve Merlot (Carneros)</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2098</th>\n",
       "      <td>Psagot 2014 Merlot</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2099 rows × 1 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                                  title\n",
       "0     Bianchi 2011 Signature Selection Merlot (Paso ...\n",
       "1                   Sundance 2011 Merlot (Maule Valley)\n",
       "2     Passaggio 2014 Blau Vineyards Merlot (Knights ...\n",
       "3     Duckhorn 2012 Rector Creek Vineyard Merlot (Na...\n",
       "4     Viña Bisquertt 2007 Casa La Joya Reserve Merlo...\n",
       "...                                                 ...\n",
       "2094   Castillo de Monjardin 2009 Deyo Merlot (Navarra)\n",
       "2095  Bonair 2006 Chateau Puryear Vineyard Merlot (R...\n",
       "2096              Hyatt 2005 Merlot (Rattlesnake Hills)\n",
       "2097            Ca' Momi 2013 Reserve Merlot (Carneros)\n",
       "2098                                 Psagot 2014 Merlot\n",
       "\n",
       "[2099 rows x 1 columns]"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cursor = winecollection.find({'variety': 'Merlot'}, {'title': 1, '_id': 0})\n",
    "qtext = dumps(cursor)\n",
    "qrec = loads(qtext)\n",
    "pd.DataFrame.from_records(qrec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To keep the title, variety, points, and price, we type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>87</td>\n",
       "      <td>22.0</td>\n",
       "      <td>Bianchi 2011 Signature Selection Merlot (Paso ...</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>86</td>\n",
       "      <td>9.0</td>\n",
       "      <td>Sundance 2011 Merlot (Maule Valley)</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>86</td>\n",
       "      <td>55.0</td>\n",
       "      <td>Passaggio 2014 Blau Vineyards Merlot (Knights ...</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>91</td>\n",
       "      <td>95.0</td>\n",
       "      <td>Duckhorn 2012 Rector Creek Vineyard Merlot (Na...</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>88</td>\n",
       "      <td>11.0</td>\n",
       "      <td>Viña Bisquertt 2007 Casa La Joya Reserve Merlo...</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2094</th>\n",
       "      <td>87</td>\n",
       "      <td>18.0</td>\n",
       "      <td>Castillo de Monjardin 2009 Deyo Merlot (Navarra)</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2095</th>\n",
       "      <td>86</td>\n",
       "      <td>20.0</td>\n",
       "      <td>Bonair 2006 Chateau Puryear Vineyard Merlot (R...</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2096</th>\n",
       "      <td>86</td>\n",
       "      <td>10.0</td>\n",
       "      <td>Hyatt 2005 Merlot (Rattlesnake Hills)</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2097</th>\n",
       "      <td>90</td>\n",
       "      <td>44.0</td>\n",
       "      <td>Ca' Momi 2013 Reserve Merlot (Carneros)</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2098</th>\n",
       "      <td>91</td>\n",
       "      <td>32.0</td>\n",
       "      <td>Psagot 2014 Merlot</td>\n",
       "      <td>Merlot</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>2099 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      points  price                                              title variety\n",
       "0         87   22.0  Bianchi 2011 Signature Selection Merlot (Paso ...  Merlot\n",
       "1         86    9.0                Sundance 2011 Merlot (Maule Valley)  Merlot\n",
       "2         86   55.0  Passaggio 2014 Blau Vineyards Merlot (Knights ...  Merlot\n",
       "3         91   95.0  Duckhorn 2012 Rector Creek Vineyard Merlot (Na...  Merlot\n",
       "4         88   11.0  Viña Bisquertt 2007 Casa La Joya Reserve Merlo...  Merlot\n",
       "...      ...    ...                                                ...     ...\n",
       "2094      87   18.0   Castillo de Monjardin 2009 Deyo Merlot (Navarra)  Merlot\n",
       "2095      86   20.0  Bonair 2006 Chateau Puryear Vineyard Merlot (R...  Merlot\n",
       "2096      86   10.0              Hyatt 2005 Merlot (Rattlesnake Hills)  Merlot\n",
       "2097      90   44.0            Ca' Momi 2013 Reserve Merlot (Carneros)  Merlot\n",
       "2098      91   32.0                                 Psagot 2014 Merlot  Merlot\n",
       "\n",
       "[2099 rows x 4 columns]"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cursor = winecollection.find({'variety': 'Merlot'}, \n",
    "                             {'title': 1,\n",
    "                             'variety': 1,\n",
    "                             'points': 1,\n",
    "                             'price': 1,\n",
    "                             '_id': 0})\n",
    "qtext = dumps(cursor)\n",
    "qrec = loads(qtext)\n",
    "pd.DataFrame.from_records(qrec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Updating Records\n",
    "Updating records in MongoDB is similar to selecting records in that we use the same logical conditions we use for selecting records for identifying the records we want to edit. The `.update_one()` method, applied to a collection, has two arguments. First we specify a logical condition that identifies the records we want to edit. Then we use the `$set` operator to choose specific fields within the existing JSON record to change. If we want, we can even write an entire replacement dictionary for this record, and write it along with `$set`. For example, to identify the record of the wine from Dwyane Wade's winery, we can query `{'location.winery': 'D Wade Cellars'}` as we did in the previous section. Suppose that we want to edit this record so that the price increases to \\\\$45. We can do so with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>title</th>\n",
       "      <th>description</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>price</th>\n",
       "      <th>variety</th>\n",
       "      <th>location</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>This wine goes great with dinner just like Dwy...</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>45</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id                                     title  \\\n",
       "0  5edd56e5b4e58ce3841e5dea  2016 Napa Valley Three By Wade Red Blend   \n",
       "\n",
       "                                         description      taster_name  \\\n",
       "0  This wine goes great with dinner just like Dwy...  Jonathan Kropko   \n",
       "\n",
       "  taster_twitter_handle  price    variety  \\\n",
       "0              @jmk5131     45  Red Blend   \n",
       "\n",
       "                                            location  \n",
       "0  {'region_1': 'Napa Valley', 'region_2': None, ...  "
      ]
     },
     "execution_count": 85,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.update_one({'location.winery': 'D Wade Cellars'},\n",
    "                     {'$set' : {'price': 45}})\n",
    "mongo_read_query(winecollection, {'location.winery': 'D Wade Cellars'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that we wanted to add a field that does not currently exist in the record, like `points`. We can use the same syntax to add fields:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>title</th>\n",
       "      <th>description</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>price</th>\n",
       "      <th>variety</th>\n",
       "      <th>location</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>This wine goes great with dinner just like Dwy...</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>45</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "      <td>90</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id                                     title  \\\n",
       "0  5edd56e5b4e58ce3841e5dea  2016 Napa Valley Three By Wade Red Blend   \n",
       "\n",
       "                                         description      taster_name  \\\n",
       "0  This wine goes great with dinner just like Dwy...  Jonathan Kropko   \n",
       "\n",
       "  taster_twitter_handle  price    variety  \\\n",
       "0              @jmk5131     45  Red Blend   \n",
       "\n",
       "                                            location  score  \n",
       "0  {'region_1': 'Napa Valley', 'region_2': None, ...     90  "
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.update_one({'location.winery': 'D Wade Cellars'},\n",
    "                     {'$set' : {'score': 90}})\n",
    "mongo_read_query(winecollection, {'location.winery': 'D Wade Cellars'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can change more than one field at a time within one call to the `$set` operator. To change both the score and the price of the Dwyane Wade wine, we can type:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>title</th>\n",
       "      <th>description</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>price</th>\n",
       "      <th>variety</th>\n",
       "      <th>location</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>This wine goes great with dinner just like Dwy...</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>50</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "      <td>95</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id                                     title  \\\n",
       "0  5edd56e5b4e58ce3841e5dea  2016 Napa Valley Three By Wade Red Blend   \n",
       "\n",
       "                                         description      taster_name  \\\n",
       "0  This wine goes great with dinner just like Dwy...  Jonathan Kropko   \n",
       "\n",
       "  taster_twitter_handle  price    variety  \\\n",
       "0              @jmk5131     50  Red Blend   \n",
       "\n",
       "                                            location  score  \n",
       "0  {'region_1': 'Napa Valley', 'region_2': None, ...     95  "
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.update_one({'location.winery': 'D Wade Cellars'},\n",
    "                     {'$set' : {'score': 95,\n",
    "                               'price': 50}})\n",
    "mongo_read_query(winecollection, {'location.winery': 'D Wade Cellars'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Suppose that the wine is reviewed by LeBron James, NBA star and noted [wine connoisseur](https://twitter.com/KingJames/status/1239424365621469184?s=20), who provided a new score and description. We can update the entire record by first defining a Python variable that contains the record:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "dwadewine2 = {'title': '2016 Napa Valley Three By Wade Red Blend', \n",
    "'description': \"This wine is very good. Not as great as me. But plenty great enough for Miami.\", \n",
    "'taster_name': 'LeBron James', \n",
    "'taster_twitter_handle': '@kingjames', \n",
    "'price': 45,\n",
    "'score': 99,\n",
    "'variety': 'Red Blend', \n",
    "'location':{\n",
    "    'region_1': 'Napa Valley', \n",
    "    'region_2': None, \n",
    "    'province': 'California', \n",
    "    'country': 'U.S.', \n",
    "    'winery': 'D Wade Cellars'}}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can replace the existing record for this wine by specifying this dictionary as the second argument of the `.update_one()` method:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>title</th>\n",
       "      <th>description</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>price</th>\n",
       "      <th>variety</th>\n",
       "      <th>location</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>This wine is very good. Not as great as me. Bu...</td>\n",
       "      <td>LeBron James</td>\n",
       "      <td>@kingjames</td>\n",
       "      <td>45</td>\n",
       "      <td>Red Blend</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "      <td>99</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        _id                                     title  \\\n",
       "0  5edd56e5b4e58ce3841e5dea  2016 Napa Valley Three By Wade Red Blend   \n",
       "\n",
       "                                         description   taster_name  \\\n",
       "0  This wine is very good. Not as great as me. Bu...  LeBron James   \n",
       "\n",
       "  taster_twitter_handle  price    variety  \\\n",
       "0            @kingjames     45  Red Blend   \n",
       "\n",
       "                                            location  score  \n",
       "0  {'region_1': 'Napa Valley', 'region_2': None, ...     99  "
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.update_one({'location.winery': 'D Wade Cellars'},\n",
    "                     {'$set' : dwadewine2})\n",
    "mongo_read_query(winecollection, {'location.winery': 'D Wade Cellars'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A second method for editing records is `.update_all()` which revises every document that matches a query. I don't recommend using this method except in very specific cases, because it is easy to destroy large portions of a database with a mistyped query. But for the sake of illustration, suppose we wanted to change the names of the \"Red Blend\" varieties of wines to \"R. Blend\". We can do that with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>_id</th>\n",
       "      <th>wine_id</th>\n",
       "      <th>country</th>\n",
       "      <th>description</th>\n",
       "      <th>points</th>\n",
       "      <th>price</th>\n",
       "      <th>province</th>\n",
       "      <th>region</th>\n",
       "      <th>taster_name</th>\n",
       "      <th>taster_twitter_handle</th>\n",
       "      <th>title</th>\n",
       "      <th>variety</th>\n",
       "      <th>winery</th>\n",
       "      <th>location</th>\n",
       "      <th>score</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>5ed80dcca25fcf746119e3be</td>\n",
       "      <td>20.0</td>\n",
       "      <td>US</td>\n",
       "      <td>Ripe aromas of dark berries mingle with ample ...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>23</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Virginia</td>\n",
       "      <td>Alexander Peartree</td>\n",
       "      <td>None</td>\n",
       "      <td>Quiévremont 2012 Vin de Maison Red (Virginia)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Quiévremont</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>5ed80dcca25fcf746119e3c6</td>\n",
       "      <td>28.0</td>\n",
       "      <td>Italy</td>\n",
       "      <td>Aromas suggest mature berry, scorched earth, a...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>17</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Cerasuolo di Vittoria</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Terre di Giurfo 2011 Mascaria Barricato  (Cera...</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Terre di Giurfo</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>5ed80dcca25fcf746119e3dc</td>\n",
       "      <td>68.0</td>\n",
       "      <td>US</td>\n",
       "      <td>Very deep in color and spicy-smoky in flavor, ...</td>\n",
       "      <td>86.0</td>\n",
       "      <td>12</td>\n",
       "      <td>California</td>\n",
       "      <td>California</td>\n",
       "      <td>Jim Gordon</td>\n",
       "      <td>@gordone_cellars</td>\n",
       "      <td>Cocobon 2014 Red (California)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Cocobon</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>5ed80dcca25fcf746119e3f2</td>\n",
       "      <td>90.0</td>\n",
       "      <td>US</td>\n",
       "      <td>This blend of Sangiovese, Malbec, Cabernet Sau...</td>\n",
       "      <td>88.0</td>\n",
       "      <td>23</td>\n",
       "      <td>California</td>\n",
       "      <td>Sonoma County</td>\n",
       "      <td>Virginie Boone</td>\n",
       "      <td>@vboone</td>\n",
       "      <td>Ferrari-Carano 2014 Siena Red (Sonoma County)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Ferrari-Carano</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5ed80dcca25fcf746119e400</td>\n",
       "      <td>104.0</td>\n",
       "      <td>Italy</td>\n",
       "      <td>Made with 65% Sangiovese, 20% Merlot and 15% C...</td>\n",
       "      <td>87.0</td>\n",
       "      <td>16</td>\n",
       "      <td>Tuscany</td>\n",
       "      <td>Toscana</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Madonna Alta 2014 Nativo Red (Toscana)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Madonna Alta</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7106</th>\n",
       "      <td>5ed80dcfa25fcf74611b78b3</td>\n",
       "      <td>129932.0</td>\n",
       "      <td>Argentina</td>\n",
       "      <td>Andeluna's top wines tend to be ripe and plump...</td>\n",
       "      <td>91.0</td>\n",
       "      <td>55</td>\n",
       "      <td>Mendoza Province</td>\n",
       "      <td>Uco Valley</td>\n",
       "      <td>Michael Schachner</td>\n",
       "      <td>@wineschach</td>\n",
       "      <td>Andeluna 2004 Pasionado Red (Uco Valley)</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Andeluna</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7107</th>\n",
       "      <td>5ed80dcfa25fcf74611b78bd</td>\n",
       "      <td>129943.0</td>\n",
       "      <td>Italy</td>\n",
       "      <td>A blend of Nero d'Avola and Syrah, this convey...</td>\n",
       "      <td>90.0</td>\n",
       "      <td>29</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Sicilia</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Baglio del Cristo di Campobello 2012 Adènzia R...</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Baglio del Cristo di Campobello</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7108</th>\n",
       "      <td>5ed80dcfa25fcf74611b78c1</td>\n",
       "      <td>129947.0</td>\n",
       "      <td>Italy</td>\n",
       "      <td>A blend of 65% Cabernet Sauvignon, 30% Merlot ...</td>\n",
       "      <td>90.0</td>\n",
       "      <td>20</td>\n",
       "      <td>Sicily &amp; Sardinia</td>\n",
       "      <td>Terre Siciliane</td>\n",
       "      <td>Kerin O’Keefe</td>\n",
       "      <td>@kerinokeefe</td>\n",
       "      <td>Feudo Principi di Butera 2012 Symposio Red (Te...</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>Feudo Principi di Butera</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7109</th>\n",
       "      <td>5edd56e5b4e58ce3841e5dea</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This wine is very good. Not as great as me. Bu...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>45</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>LeBron James</td>\n",
       "      <td>@kingjames</td>\n",
       "      <td>2016 Napa Valley Three By Wade Red Blend</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Napa Valley', 'region_2': None, ...</td>\n",
       "      <td>99.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7110</th>\n",
       "      <td>5edd56e6b4e58ce3841e5deb</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>This wine will make you speak differently. May...</td>\n",
       "      <td>NaN</td>\n",
       "      <td>40.99</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>Jonathan Kropko</td>\n",
       "      <td>@jmk5131</td>\n",
       "      <td>Anta Banderas A 10 2008</td>\n",
       "      <td>R. Blend</td>\n",
       "      <td>NaN</td>\n",
       "      <td>{'region_1': 'Ribera del Duoro', 'region_2': N...</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>7111 rows × 15 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                           _id   wine_id    country  \\\n",
       "0     5ed80dcca25fcf746119e3be      20.0         US   \n",
       "1     5ed80dcca25fcf746119e3c6      28.0      Italy   \n",
       "2     5ed80dcca25fcf746119e3dc      68.0         US   \n",
       "3     5ed80dcca25fcf746119e3f2      90.0         US   \n",
       "4     5ed80dcca25fcf746119e400     104.0      Italy   \n",
       "...                        ...       ...        ...   \n",
       "7106  5ed80dcfa25fcf74611b78b3  129932.0  Argentina   \n",
       "7107  5ed80dcfa25fcf74611b78bd  129943.0      Italy   \n",
       "7108  5ed80dcfa25fcf74611b78c1  129947.0      Italy   \n",
       "7109  5edd56e5b4e58ce3841e5dea       NaN        NaN   \n",
       "7110  5edd56e6b4e58ce3841e5deb       NaN        NaN   \n",
       "\n",
       "                                            description  points  price  \\\n",
       "0     Ripe aromas of dark berries mingle with ample ...    87.0     23   \n",
       "1     Aromas suggest mature berry, scorched earth, a...    87.0     17   \n",
       "2     Very deep in color and spicy-smoky in flavor, ...    86.0     12   \n",
       "3     This blend of Sangiovese, Malbec, Cabernet Sau...    88.0     23   \n",
       "4     Made with 65% Sangiovese, 20% Merlot and 15% C...    87.0     16   \n",
       "...                                                 ...     ...    ...   \n",
       "7106  Andeluna's top wines tend to be ripe and plump...    91.0     55   \n",
       "7107  A blend of Nero d'Avola and Syrah, this convey...    90.0     29   \n",
       "7108  A blend of 65% Cabernet Sauvignon, 30% Merlot ...    90.0     20   \n",
       "7109  This wine is very good. Not as great as me. Bu...     NaN     45   \n",
       "7110  This wine will make you speak differently. May...     NaN  40.99   \n",
       "\n",
       "               province                 region         taster_name  \\\n",
       "0              Virginia               Virginia  Alexander Peartree   \n",
       "1     Sicily & Sardinia  Cerasuolo di Vittoria       Kerin O’Keefe   \n",
       "2            California             California          Jim Gordon   \n",
       "3            California          Sonoma County      Virginie Boone   \n",
       "4               Tuscany                Toscana       Kerin O’Keefe   \n",
       "...                 ...                    ...                 ...   \n",
       "7106   Mendoza Province             Uco Valley   Michael Schachner   \n",
       "7107  Sicily & Sardinia                Sicilia       Kerin O’Keefe   \n",
       "7108  Sicily & Sardinia        Terre Siciliane       Kerin O’Keefe   \n",
       "7109                NaN                    NaN        LeBron James   \n",
       "7110                NaN                    NaN     Jonathan Kropko   \n",
       "\n",
       "     taster_twitter_handle                                              title  \\\n",
       "0                     None      Quiévremont 2012 Vin de Maison Red (Virginia)   \n",
       "1             @kerinokeefe  Terre di Giurfo 2011 Mascaria Barricato  (Cera...   \n",
       "2         @gordone_cellars                      Cocobon 2014 Red (California)   \n",
       "3                  @vboone      Ferrari-Carano 2014 Siena Red (Sonoma County)   \n",
       "4             @kerinokeefe             Madonna Alta 2014 Nativo Red (Toscana)   \n",
       "...                    ...                                                ...   \n",
       "7106           @wineschach           Andeluna 2004 Pasionado Red (Uco Valley)   \n",
       "7107          @kerinokeefe  Baglio del Cristo di Campobello 2012 Adènzia R...   \n",
       "7108          @kerinokeefe  Feudo Principi di Butera 2012 Symposio Red (Te...   \n",
       "7109            @kingjames           2016 Napa Valley Three By Wade Red Blend   \n",
       "7110              @jmk5131                            Anta Banderas A 10 2008   \n",
       "\n",
       "       variety                           winery  \\\n",
       "0     R. Blend                      Quiévremont   \n",
       "1     R. Blend                  Terre di Giurfo   \n",
       "2     R. Blend                          Cocobon   \n",
       "3     R. Blend                   Ferrari-Carano   \n",
       "4     R. Blend                     Madonna Alta   \n",
       "...        ...                              ...   \n",
       "7106  R. Blend                         Andeluna   \n",
       "7107  R. Blend  Baglio del Cristo di Campobello   \n",
       "7108  R. Blend         Feudo Principi di Butera   \n",
       "7109  R. Blend                              NaN   \n",
       "7110  R. Blend                              NaN   \n",
       "\n",
       "                                               location  score  \n",
       "0                                                   NaN    NaN  \n",
       "1                                                   NaN    NaN  \n",
       "2                                                   NaN    NaN  \n",
       "3                                                   NaN    NaN  \n",
       "4                                                   NaN    NaN  \n",
       "...                                                 ...    ...  \n",
       "7106                                                NaN    NaN  \n",
       "7107                                                NaN    NaN  \n",
       "7108                                                NaN    NaN  \n",
       "7109  {'region_1': 'Napa Valley', 'region_2': None, ...   99.0  \n",
       "7110  {'region_1': 'Ribera del Duoro', 'region_2': N...    NaN  \n",
       "\n",
       "[7111 rows x 15 columns]"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.update_many({'variety': 'Red Blend'},\n",
    "                          {'$set': {'variety': 'R. Blend'}})\n",
    "mongo_read_query(winecollection, {'variety': 'R. Blend'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Performing Text Searches\n",
    "One of the great advantages of a document store database is the ability to search through the text within the documents and extract records that match a certain pattern. A text search in MongoDB involves two steps:\n",
    "\n",
    "* First, we will create a **text index**: a particular field in the records that contains the text we want MongoDB to search within.\n",
    "\n",
    "* Second, we will use the `$text` operator within a call to `.find()` to specify the search terms.\n",
    "\n",
    "To create a text index, we can use the syntax\n",
    "```\n",
    "collection.create_index[('keytosearch', 'text')]\n",
    "```\n",
    "We will replace `'keytosearch' with the name of the field in the JSON dictionaries on which we want to search, but we will leave `'text'` as is because this code tells MongoDB to search for text. The code to set the `description` field as the text index in the `winecollection` database is:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 91,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'description_text'"
      ]
     },
     "execution_count": 91,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "winecollection.create_index([('description', 'text')])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we've set the text index, we can search the text in that field. The general syntax for a query with a text search is\n",
    "```\n",
    "{'$text': {'$search': 'searchterms', '$caseSensitive': False}}\n",
    "```\n",
    "where `'searchterms'` contains the terms we want to search for, and `'$caseSensitive': False` tells MongoDB to ignore cases in the search, so that a search term of \"chocolate\" also matches to \"Chocolate\". Alternatively, `'$caseSensitive': True` takes case into account when matching records to a query. If a search is not case sensitive, and if it is not diacritic sensitive (taking things like accents into account, which it can do by adding the `$diacriticSensitive=True` option), then `$search` matches on the **stems** of words: the [first several letters in the word](https://en.wikipedia.org/wiki/Stemming), allowing a search term of \"blueberry\" to also match with \"blueberries\".\n",
    "\n",
    "As a simple example, now that `description` has been set as the text index, we can find all wines with descriptions that contain the word \"chocolate\". Here I save the output as a dataframe and display the description for the first wine in the output:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 92,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Supermodern in style, with mint, coconut, chocolate and huge black fruit aromas. Powerfully structured and thick-boned, with boysenberry, spice and chocolate in spades. Oaky, broad and layered on the finish, with tobacco, coffee and chocolate finishing notes.'"
      ]
     },
     "execution_count": 92,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = mongo_read_query(winecollection, {'$text': {'$search':'chocolate', '$caseSensitive': False}})\n",
    "df['description'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To search on more than one term, include the terms in the same string after `'$search'`, separated by spaces. By default, these terms are combined using the \"or\" operator, so that the query returns any document with at least one of the terms. The following code finds all wines whose descriptions contain \"chocolate\" or \"leather\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 93,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "5930    A whiff of leather introduces a wine with a st...\n",
       "Name: description, dtype: object"
      ]
     },
     "execution_count": 93,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = mongo_read_query(winecollection, {'$text': {'$search':'chocolate leather', '$caseSensitive': False}})\n",
    "df[df['wine_id']==109396]['description'] # a leathery one"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To search for documents with a phrase that contains a space, enclose the phrase in double quotes, and precede each double-quote with an \\ escape character. The following code captures descriptions with the phrase \"very good\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 94,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "\"A good to very good wine with medicinality on the nose that takes over the aromatics. There's also the slightest bit of hard cheese and stem on the bouquet, so overall it is fighting an uphill battle. Along the way it delivers flavors of cough drop, cherry and a good, solid finish.\""
      ]
     },
     "execution_count": 94,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = mongo_read_query(winecollection, {'$text': {'$search':'\\\"very good\\\"', '$caseSensitive': False}})\n",
    "df['description'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To search for documents that contain multiple search terms at once (an \"and\" operator), enclose each search term in double quotes with escape characters. We can search for descriptions that contain both \"leather\" and \"chocolate\" with the following code: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'A blend of 85% Melnik, 10% Grenache Noir and 5% Petit Verdot, this wine has aromas of saddle leather, cassis and dark chocolate. In the mouth there are flavors of cherry, chocolate and dried blueberry. It has good balance with a soft tannic finish.'"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = mongo_read_query(winecollection, {'$text': {'$search':'\\\"leather\\\" \\\"chocolate\\\"', '$caseSensitive': False}})\n",
    "df['description'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To exclude a term, we add a negative sign in front of the term we want to exclude. To find all wines whose descriptions contain the word \"dark\" but not \"chocolate\", we type"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 96,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Dark, dark, dark is this estate-grown Syrah, closed at the nose except for the profusion of alcohol, grippy with lots of oak and crazy, teeth-staining tannins.'"
      ]
     },
     "execution_count": 96,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = mongo_read_query(winecollection, {'$text': {'$search':'dark -chocolate', '$caseSensitive': False}})\n",
    "df['description'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Text searches can also be used to construct a search engine. We provide the search terms, and MongoDB generates a score for every document that represents the extent to which the search terms are relevant to the document. Once these scores have been generated, it is possible to sort the documents by the score to find the documents that are most highly related to the search terms.\n",
    "\n",
    "To rank documents by search-relevancy, we add `{'score': {'$meta': 'textScore'}}` to the query we pass to the `.find()` method. Here we enter five search terms, \"chocolate\", \"leather\", \"wood\", \"dark\", and \"smoke\": "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 97,
   "metadata": {},
   "outputs": [],
   "source": [
    "cursor = winecollection.find(\n",
    "            {'$text': {'$search': 'chocolate leather wood dark smoke'}},\n",
    "            {'score': {'$meta': 'textScore'}})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we apply the `.sort()` method to the output, arranging the documents by relevancy score, with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<pymongo.cursor.Cursor at 0x1361d6438>"
      ]
     },
     "execution_count": 98,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "cursor.sort([('score', {'$meta': 'textScore'})]) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally we can convert the output to a dataframe with the following code:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [],
   "source": [
    "qtext = dumps(cursor)\n",
    "qrec = loads(qtext)\n",
    "df = pd.DataFrame.from_records(qrec)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Among all the wine reviews in the data, here is the wine whose description had the highest relevancy score for \"chocolate\", \"leather\", \"wood\", \"dark\", and \"smoke\":"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Big and bold in character, this full-bodied and abundantly tannic wine looks black to the rim and smells like pencil shavings, leather and wood smoke. The flavors are dry but enticing, with dark chocolate, charred beef and black cherry. It needs until at least 2022 to peak.'"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df['description'][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To wrap up our work with the database, we apply the `.close()` method to the MongoDB server:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "metadata": {},
   "outputs": [],
   "source": [
    "myclient.close()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}