# → \[1, 1, 3, 4, 5\]


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## What is lazyq?

**lazyq** is a lightweight, chainable query pipeline for Python.

Instead of executing operations immediately, lazyq builds up a **map of
instructions** and only runs them when you actually need the results.
This makes it memory-efficient and great for working with large
datasets!

## Usage

### Installation

Install latest from the GitHub
[repository](https://github.com/vikasAWA/lazyq):

``` sh
$ pip install git+https://github.com/vikasAWA/lazyq.git
```

or from [pypi](https://pypi.org/project/lazyq/)

``` sh
$ pip install lazyq
```

### Documentation

Documentation can be found hosted on this GitHub
[repository](https://github.com/vikasAWA/lazyq)’s
[pages](https://vikasAWA.github.io/lazyq/). Additionally you can find
package manager specific guidelines on
[conda](https://anaconda.org/vikasAWA/lazyq) and
[pypi](https://pypi.org/project/lazyq/) respectively.

## How to use

## Usage

lazyq lets you build queries step by step. Let’s explore with some
country data!

## Datasets

``` python
scores = [
    {'name': 'Alice', 'age': 30, 'score': 85},
    {'name': 'Bob', 'age': 17, 'score': 92},
    {'name': 'Carol', 'age': 25, 'score': 60},
    {'name': 'Dave', 'age': 19, 'score': 74},
    {'name': 'Eve', 'age': 22, 'score': 91},
]
data = [
    {'dept': 'eng', 'name': 'Alice'},
    {'dept': 'hr', 'name': 'Bob'},
    {'dept': 'eng', 'name': 'Carol'},
    {'dept': 'hr', 'name': 'Dave'},
]
sales = [
    {'region': 'north', 'product': 'laptop', 'units': 10, 'price': 999},
    {'region': 'south', 'product': 'phone',  'units': 25, 'price': 499},
    {'region': 'north', 'product': 'phone',  'units': 15, 'price': 499},
    {'region': 'south', 'product': 'laptop', 'units': 8,  'price': 999},
    {'region': 'east',  'product': 'tablet', 'units': 20, 'price': 299},
    {'region': 'east',  'product': 'laptop', 'units': 5,  'price': 999},
    {'region': 'north', 'product': 'tablet', 'units': 12, 'price': 299},
]
movies = [
    {'title': 'Inception',      'genre': 'scifi',   'year': 2010, 'rating': 8.8, 'gross_m': 836},
    {'title': 'The Dark Knight','genre': 'action',  'year': 2008, 'rating': 9.0, 'gross_m': 1005},
    {'title': 'Interstellar',   'genre': 'scifi',   'year': 2014, 'rating': 8.6, 'gross_m': 773},
    {'title': 'Parasite',       'genre': 'thriller','year': 2019, 'rating': 8.6, 'gross_m': 258},
    {'title': 'Avengers',       'genre': 'action',  'year': 2019, 'rating': 8.4, 'gross_m': 2798},
    {'title': 'Joker',          'genre': 'thriller','year': 2019, 'rating': 8.4, 'gross_m': 1079},
    {'title': 'Dune',           'genre': 'scifi',   'year': 2021, 'rating': 8.0, 'gross_m': 401},
]
```

## Examples

**Listing all available Query operations**

``` python
Query.tools()
```

A convenience method that prints a formatted reference of all available
[`Query`](https://vikasAWA.github.io/lazyq/core.html#query) operations,
grouped by category:

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Category</th>
<th>What it shows</th>
</tr>
</thead>
<tbody>
<tr>
<td>📦 <strong>Data Sources</strong></td>
<td>Methods to load data from iterables, CSV, SQLite, JSON, YAML, and
Excel</td>
</tr>
<tr>
<td>🔄 <strong>Transformations</strong></td>
<td>Row-level operations: <code>map</code>, <code>filter</code>,
<code>limit</code>, <code>sort</code>, <code>groupby</code></td>
</tr>
<tr>
<td>📊 <strong>Aggregations</strong></td>
<td>Summary operations: <code>sum</code>, <code>count</code>,
<code>max</code>, <code>min</code></td>
</tr>
<tr>
<td>📤 <strong>Output</strong></td>
<td><code>collect</code> to get a list, <code>show</code> to
pretty-print a table</td>
</tr>
</tbody>
</table>

> **Tip:** Run
> [`Query.tools()`](https://vikasAWA.github.io/lazyq/core.html#query.tools)
> any time you need a quick reminder of what’s available — no imports or
> docs needed.

``` python
Query.tools()
```


    📦 Data Sources:

    from_iterable(data)                 → Create Query from list/iterable
    from_csv(path)                      → Load data from CSV file
    from_sqlite(db, table)              → Load table from SQLite database
    from_sqlite_query(db, query)        → Run custom SQL query on SQLite database
    from_json(path, key=None, stream=False) → Load data from JSON file
    from_yaml(path, key=None)           → Load data from YAML file
    from_excel(path, sheet_name=None)   → Load data from Excel file

    🔄 Transformations:

    map(fn)                             → Transform each row
    filter(fn)                          → Filter rows by condition
    limit(n)                            → Limit number of rows
    sort(key, reverse=False)            → Sort rows
    groupby(key)                        → Group rows by a key

    📊 Aggregations:

    sum(by=None)                        → Sum values
    count()                             → Count rows
    max(by=None)                        → Maximum value/row
    min(by=None)                        → Minimum value/row

    📤 Output:

    collect(n=None)                     → Return results as list
    show(n=5)                           → Pretty print results

``` python
# plain numbers
Query.from_iterable([3, 1, 4, 1, 5]).sort().collect()
```

    [1, 1, 3, 4, 5]

**Chaining filters, sorts, and display**

Filter rows by a condition, sort by a field, and preview results with
`.show()`.

``` python
Query.from_iterable(scores) \
    .filter(F('age') > 19) \
    .sort(key='score', reverse=True) \
    .show()
```

- `F('age') > 19` — keeps only rows where `age` is greater than 19
- `.sort(key='score', reverse=True)` — sorts by `score`, highest first
- `.show()` — pretty-prints the top 5 rows as a table

``` python
# chained
Query.from_iterable(scores)\
    .filter(F('age') > 19)\
    .sort(key='score', reverse=True)\
    .show()
```

    name  | age | score
    -------------------
    Eve   | 22  | 91   
    Alice | 30  | 85   
    Carol | 25  | 60   

**Grouping and aggregating**

Use `.groupby(key)` followed by `.sum(field)` to total a numeric field
across groups.

``` python
Query.from_iterable(sales).groupby('region').sum('units').collect()
# → [('north', 37.0), ('south', 33.0), ('east', 25.0)]
```

- `.groupby('region')` — groups rows by the `'region'` field
- `.sum('units')` — sums the `'units'` field within each group
- `.collect()` — returns results as a list of `(group_key, total)`
  tuples

``` python
Query.from_iterable(sales).groupby('region').sum('units').collect()
```

    [('north', 37.0), ('south', 33.0), ('east', 25.0)]

**Top N results from a filtered, sorted dataset**

Filter rows by a field value, sort by another field, limit results, and
display as a table.

``` python
Query.from_iterable(movies) \
    .filter(F('genre') == 'scifi') \
    .sort('rating', reverse=True) \
    .limit(3) \
    .show()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>filter(F('genre') == 'scifi')</code></td>
<td>Keeps only rows where <code>genre</code> is
<code>'scifi'</code></td>
</tr>
<tr>
<td><code>.sort('rating', reverse=True)</code></td>
<td>Sorts by <code>rating</code>, highest first</td>
</tr>
<tr>
<td><code>.limit(3)</code></td>
<td>Takes only the top 3 results</td>
</tr>
<tr>
<td><code>.show()</code></td>
<td>Pretty-prints results as a table</td>
</tr>
</tbody>
</table>

**Output:**

    title        | genre | year | rating | gross_m
    ----------------------------------------------
    Inception    | scifi | 2010 | 8.8    | 836
    Interstellar | scifi | 2014 | 8.6    | 773
    Dune         | scifi | 2021 | 8.0    | 401

> **Tip:** You can swap `'scifi'` for `'action'` or `'thriller'`, change
> the sort field to `'gross_m'`, or adjust `limit(3)` to get different
> slices of your data.

------------------------------------------------------------------------

``` python
Query.from_iterable(movies) \
    .filter(F('genre') == 'scifi') \
    .sort('rating', reverse=True) \
    .limit(3) \
    .show()
```

    title        | genre | year | rating | gross_m
    ----------------------------------------------
    Inception    | scifi | 2010 | 8.8    | 836    
    Interstellar | scifi | 2014 | 8.6    | 773    
    Dune         | scifi | 2021 | 8.0    | 401    

**Loading and previewing data from a SQLite database**

Use
[`Query.from_sqlite()`](https://vikasAWA.github.io/lazyq/core.html#query.from_sqlite)
to read directly from a SQLite table and preview the results.

``` python
Query.from_sqlite('shows.db', 'shows').show()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>from_sqlite('shows.db', 'shows')</code></td>
<td>Connects to <code>shows.db</code> and reads all rows from the
<code>shows</code> table</td>
</tr>
<tr>
<td><code>.show()</code></td>
<td>Pretty-prints the first 5 rows as a table</td>
</tr>
</tbody>
</table>

**Output:**

    id    | title                       | year | episodes
    -----------------------------------------------------
    62614 | Zeg 'ns Aaa                 | 1981 | 227
    63881 | Catweazle                   | 1970 | 26
    ...

> **Tip:** You can chain further operations after `from_sqlite()`, for
> example:
>
> ``` python
> Query.from_sqlite('shows.db', 'shows') \
>     .filter(F('episodes').not_null()) \
>     .filter(F('year') > 1975) \
>     .sort('episodes', reverse=True) \
>     .show()
> ```
>
> For custom SQL, use `Query.from_sqlite_query(db, query)` instead.

------------------------------------------------------------------------

``` python
shows_table = Query.from_sqlite('../data/shows.db', 'shows')
shows_table.show()
```

    id    | title                       | year | episodes
    -----------------------------------------------------
    62614 | Zeg 'ns Aaa                 | 1981 | 227     
    63881 | Catweazle                   | 1970 | 26      
    63962 | UFO                         | 1970 | 26      
    65269 | Ace of Wands                | 1970 | 46      
    65270 | The Adventures of Don Quick | 1970 | 6       

``` python
shows_table \
    .filter(F('year').not_null()) \
    .filter(F('episodes').not_null()) \
    .show()
```

    id    | title                       | year | episodes
    -----------------------------------------------------
    62614 | Zeg 'ns Aaa                 | 1981 | 227     
    63881 | Catweazle                   | 1970 | 26      
    63962 | UFO                         | 1970 | 26      
    65269 | Ace of Wands                | 1970 | 46      
    65270 | The Adventures of Don Quick | 1970 | 6       

**Finding the top N rows by a numeric field**

Filter out nulls, sort descending, and limit to get the top results.

``` python
Query.from_sqlite('shows.db', 'shows') \
    .filter(F('episodes').not_null()) \
    .sort(key='episodes', reverse=True) \
    .limit(5) \
    .show()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>from_sqlite('shows.db', 'shows')</code></td>
<td>Reads all rows from the <code>shows</code> table</td>
</tr>
<tr>
<td><code>.filter(F('episodes').not_null())</code></td>
<td>Excludes rows where <code>episodes</code> is
<code>None</code>/missing</td>
</tr>
<tr>
<td><code>.sort(key='episodes', reverse=True)</code></td>
<td>Sorts by episode count, highest first</td>
</tr>
<tr>
<td><code>.limit(5)</code></td>
<td>Keeps only the top 5 results</td>
</tr>
<tr>
<td><code>.show()</code></td>
<td>Pretty-prints as a table</td>
</tr>
</tbody>
</table>

**Output:**

    id       | title                      | year | episodes
    -------------------------------------------------------
    12164062 | NRK Nyheter                | 2007 | 18593
    69658    | The Young and the Restless | 1973 | 13297
    ...

> **Tip:** Replace `'episodes'` with any numeric field (e.g. `'year'`)
> and adjust `limit(5)` to get different top-N results.

------------------------------------------------------------------------

``` python
# Top 5 shows by episodes
shows_table\
    .filter(F('episodes').not_null())\
    .sort(key='episodes', reverse=True)\
    .limit(5).show()
```

    id       | title                      | year | episodes
    -------------------------------------------------------
    12164062 | NRK Nyheter                | 2007 | 18593   
    69658    | The Young and the Restless | 1973 | 13297   
    988827   | See the World by Train     | 1987 | 10674   
    33062164 | WREG News 3 at 10PM        | 1975 | 10663   
    175383   | Barátok közt               | 1998 | 10456   

**Counting records per group, sorted by frequency**

``` python
Query.from_sqlite('shows.db', 'shows') \
    .filter(F('year').not_null()) \
    .groupby('year').count() \
    .sort(key=1, reverse=True) \
    .collect(7)
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>.filter(F('year').not_null())</code></td>
<td>Drops rows with no year</td>
</tr>
<tr>
<td><code>.groupby('year').count()</code></td>
<td>Counts shows per year, yielding <code>(year, count)</code>
tuples</td>
</tr>
<tr>
<td><code>.sort(key=1, reverse=True)</code></td>
<td>Sorts by count (index <code>1</code>), highest first</td>
</tr>
<tr>
<td><code>.collect(7)</code></td>
<td>Returns top 7 results as a list</td>
</tr>
</tbody>
</table>

> **Tip:** `key=1` refers to the second element of the tuple (the
> count). Use `key=0` to sort alphabetically/chronologically by the
> group key instead.

``` python
# Count shows per year
shows_table\
    .filter(F('year').not_null())\
    .groupby('year').count().sort(key=1, reverse=True).collect(7)
```

    [(2020, 12705),
     (2021, 12241),
     (2018, 11646),
     (2017, 11591),
     (2019, 11534),
     (2016, 11339),
     (2015, 11121)]

``` python
print("=== Sort ===")
Query.from_iterable(scores).sort(key='age').show()

print("\n=== GroupedQuery count ===")
print(Query.from_iterable(movies).groupby('genre').count().collect())

print("\n=== GroupedQuery max ===")
print(Query.from_iterable(movies).groupby('genre').max(by='rating').collect())

print("\n=== Double iteration ===")
q = Query.from_iterable(scores).limit(3)
print(q.collect())
print(q.collect())
```

    === Sort ===
    name  | age | score
    -------------------
    Bob   | 17  | 92   
    Dave  | 19  | 74   
    Eve   | 22  | 91   
    Carol | 25  | 60   
    Alice | 30  | 85   

    === GroupedQuery count ===
    [('scifi', 3), ('action', 2), ('thriller', 2)]

    === GroupedQuery max ===
    [('scifi', {'title': 'Inception', 'genre': 'scifi', 'year': 2010, 'rating': 8.8, 'gross_m': 836}), ('action', {'title': 'The Dark Knight', 'genre': 'action', 'year': 2008, 'rating': 9.0, 'gross_m': 1005}), ('thriller', {'title': 'Parasite', 'genre': 'thriller', 'year': 2019, 'rating': 8.6, 'gross_m': 258})]

    === Double iteration ===
    [{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}]
    [{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}]

**Top shows by episode count, grouped by year**

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>.filter(F('episodes').not_null())</code></td>
<td>Skips shows with no episode count</td>
</tr>
<tr>
<td><code>.groupby('year').max(by='episodes')</code></td>
<td>Finds the most-episodes show per year, yielding
<code>(year, row)</code> tuples</td>
</tr>
<tr>
<td><code>.sort(key=lambda x: x[1]['episodes'], reverse=True)</code></td>
<td>Sorts groups by their max episode count, highest first</td>
</tr>
<tr>
<td><code>.collect(4)</code></td>
<td>Returns the top 4 results as a list</td>
</tr>
</tbody>
</table>

> **Tip:** The `lambda x: x[1]['episodes']` key drills into the nested
> row dict inside each `(year, row)` tuple. Use `key=lambda x: x[0]` to
> sort by year instead.

``` python
#Top shows by episode count, grouped by year
shows_table\
    .filter(F('episodes').not_null()) \
    .groupby('year').max(by='episodes') \
    .sort(key=lambda x: x[1]['episodes'], reverse=True) \
    .collect(5)
```

    [(2007,
      {'id': 12164062, 'title': 'NRK Nyheter', 'year': 2007, 'episodes': 18593}),
     (1973,
      {'id': 69658,
       'title': 'The Young and the Restless',
       'year': 1973,
       'episodes': 13297}),
     (1987,
      {'id': 988827,
       'title': 'See the World by Train',
       'year': 1987,
       'episodes': 10674}),
     (1975,
      {'id': 33062164,
       'title': 'WREG News 3 at 10PM',
       'year': 1975,
       'episodes': 10663}),
     (1998,
      {'id': 175383, 'title': 'Barátok közt', 'year': 1998, 'episodes': 10456})]

``` python
#total episodes per year (top 5 years)
shows_table\
    .filter(F('episodes').not_null())\
    .groupby('year').sum(by='episodes').sort(key=1, reverse=True).limit(5).collect()
```

    [(2012, 324658.0),
     (2011, 313426.0),
     (2017, 301113.0),
     (2013, 287416.0),
     (2014, 276226.0)]

**Listing all tables in a SQLite database**

``` python
Query.from_sqlite_query('shows.db',
    "SELECT name FROM sqlite_master WHERE type='table'"
).collect()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>from_sqlite_query(db, sql)</code></td>
<td>Runs a raw SQL query instead of reading a full table</td>
</tr>
<tr>
<td><code>.collect()</code></td>
<td>Returns all results as a list of dicts</td>
</tr>
</tbody>
</table>

> **Tip:** Use `from_sqlite_query()` whenever you need custom SQL —
> joins, subqueries, or schema inspection like this example.

``` python
tables = Query.from_sqlite_query('../data/shows.db', "SELECT name FROM sqlite_master WHERE type='table'").collect()
tables
```

    [{'name': 'genres'},
     {'name': 'people'},
     {'name': 'ratings'},
     {'name': 'shows'},
     {'name': 'stars'},
     {'name': 'writers'}]

``` python
ratings = Query.from_sqlite('../data/shows.db', 'ratings')
ratings.show()
```

    show_id | rating | votes
    ------------------------
    62614   | 6.7    | 392  
    63881   | 7.9    | 1224 
    63962   | 7.9    | 4518 
    65269   | 7.5    | 141  
    65270   | 7.5    | 33   

**Joining tables with custom SQL**

``` python
joined = Query.from_sqlite_query('shows.db', """
    SELECT s.title, s.year, s.episodes, r.rating, r.votes
    FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>from_sqlite_query(db, sql)</code></td>
<td>Runs a SQL <code>JOIN</code> across <code>shows</code> and
<code>ratings</code> tables</td>
</tr>
<tr>
<td><code>ON s.id = r.show_id</code></td>
<td>Matches each show to its rating by ID</td>
</tr>
<tr>
<td><code>.show()</code></td>
<td>Previews the first 5 rows of the merged result</td>
</tr>
</tbody>
</table>

> **Tip:** Once joined, you can chain any Query operations —
> e.g. `.filter(F('rating') > 8.0).sort('votes', reverse=True)` to find
> highly-rated, well-known shows.

``` python
joined = Query.from_sqlite_query('../data/shows.db', """
    SELECT s.title, s.year, s.episodes, r.rating, r.votes
    FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()
```

    title                       | year | episodes | rating | votes
    --------------------------------------------------------------
    Zeg 'ns Aaa                 | 1981 | 227      | 6.7    | 392  
    Catweazle                   | 1970 | 26       | 7.9    | 1224 
    UFO                         | 1970 | 26       | 7.9    | 4518 
    Ace of Wands                | 1970 | 46       | 7.5    | 141  
    The Adventures of Don Quick | 1970 | 6        | 7.5    | 33   

``` python
# Top 10 highest rated shows (with at least 1000 votes)
joined.filter(F('votes') > 1000).sort(key='rating', reverse=True).limit(10).show(10)
```

    title                      | year | episodes | rating | votes  
    ---------------------------------------------------------------
    Coke Studio Pakistan       | 2008 | 109      | 9.6    | 1373   
    Amrutham                   | 2001 | 61       | 9.6    | 1029   
    Choufli Hal                | 2005 | 135      | 9.6    | 3265   
    Geografens testamente      | 2011 | 38       | 9.6    | 1040   
    Friday Five Sharp          | 2015 | 74       | 9.6    | 4571   
    Breaking Bad               | 2008 | 62       | 9.5    | 2413272
    Call Me Kevin              | 2016 | 3        | 9.5    | 1949   
    The Why Files              | 2020 | 210      | 9.5    | 1142   
    Upanishad Ganga            | 2012 | 50       | 9.5    | 1265   
    The Surrealists' Top Chart | 1984 | 33       | 9.4    | 1787   

**Most voted shows with episode data**

``` python
joined \
    .filter(F('episodes').not_null()) \
    .sort(key='votes', reverse=True) \
    .limit(5) \
    .show()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>.filter(F('episodes').not_null())</code></td>
<td>Excludes shows with missing episode counts</td>
</tr>
<tr>
<td><code>.sort(key='votes', reverse=True)</code></td>
<td>Sorts by vote count, most voted first</td>
</tr>
<tr>
<td><code>.limit(5)</code></td>
<td>Keeps the top 5 results</td>
</tr>
<tr>
<td><code>.show()</code></td>
<td>Pretty-prints as a table</td>
</tr>
</tbody>
</table>

> **Tip:** Combine with a rating filter —
> e.g. `.filter(F('votes') > 500000 & F('rating') > 8.5)` — to find both
> massively popular *and* critically acclaimed shows.

``` python
# Most voted shows
joined.filter(F('episodes').not_null()).sort(key='votes', reverse=True).limit(5).show()
```

    title            | year | episodes | rating | votes  
    -----------------------------------------------------
    Game of Thrones  | 2011 | 74       | 9.2    | 2491529
    Breaking Bad     | 2008 | 62       | 9.5    | 2413272
    Stranger Things  | 2016 | 42       | 8.6    | 1493344
    The Walking Dead | 2010 | 177      | 8.1    | 1165465
    Friends          | 1994 | 235      | 8.9    | 1164587

``` python
for t in tables:
    print(f"\n=== {t['name']} ===")
    Query.from_sqlite('../data/shows.db', t['name']).show(3)
```


    === genres ===
    show_id | genre    
    -------------------
    62614   | Comedy   
    63881   | Adventure
    63881   | Comedy   

    === people ===
    id | name            | birth
    ----------------------------
    1  | Fred Astaire    | 1899 
    2  | Lauren Bacall   | 1924 
    3  | Brigitte Bardot | 1934 

    === ratings ===
    show_id | rating | votes
    ------------------------
    62614   | 6.7    | 392  
    63881   | 7.9    | 1224 
    63962   | 7.9    | 4518 

    === shows ===
    id    | title       | year | episodes
    -------------------------------------
    62614 | Zeg 'ns Aaa | 1981 | 227     
    63881 | Catweazle   | 1970 | 26      
    63962 | UFO         | 1970 | 26      

    === stars ===
    show_id | person_id
    -------------------
    62614   | 853892   
    62614   | 393936   
    62614   | 393936   

    === writers ===
    show_id | person_id
    -------------------
    62614   | 887216   
    62614   | 688754   
    63881   | 139441   

**Best rated show per genre**

``` python
Query.from_sqlite_query('shows.db', sql_q)
    .filter(F('votes') > 1000)
    .groupby('genre').max(by='rating')
    .collect()
```

Think of this like sorting a pile of movies into genre buckets (Action,
Comedy, Drama…), then picking the **highest-rated show** from each
bucket. We also require at least **1,000 votes** so obscure shows don’t
sneak in with a perfect score from 5 friends.

------------------------------------------------------------------------

``` python
# joining shows, ratings and genre on show_id
sql_q ="""
    SELECT s.title, s.year, g.genre, r.rating, r.votes
    FROM shows s 
    JOIN ratings r ON s.id = r.show_id
    JOIN genres g ON s.id = g.show_id
"""
```

``` python
# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
   .groupby('genre').max(by='rating').collect(3)
```

    [('Adventure',
      {'title': 'Geografens testamente',
       'year': 2011,
       'genre': 'Adventure',
       'rating': 9.6,
       'votes': 1040}),
     ('Comedy',
      {'title': 'Amrutham',
       'year': 2001,
       'genre': 'Comedy',
       'rating': 9.6,
       'votes': 1029}),
     ('Family',
      {'title': "De'ah Da'iah",
       'year': 2008,
       'genre': 'Family',
       'rating': 9.4,
       'votes': 1208})]

**Same query, but sorted alphabetically (Z → A)**

``` python
...max(by='rating').sort(reverse=True).collect()
```

This is identical to the query above, with one addition:
`.sort(reverse=True)` arranges the results **alphabetically by genre
name, from Z to A** (Western → Action). Without it, genres appear in the
order they were first encountered in the data.

``` python
# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
   .groupby('genre').max(by='rating').sort(reverse=True).collect(3) # short alphbatically in reverse order
```

    [('Western',
      {'title': 'Yellowstone',
       'year': 2018,
       'genre': 'Western',
       'rating': 8.6,
       'votes': 291730}),
     ('War',
      {'title': 'The Palestinian Alienation',
       'year': 2004,
       'genre': 'War',
       'rating': 9.4,
       'votes': 1589}),
     ('Thriller',
      {'title': 'Breaking Bad',
       'year': 2008,
       'genre': 'Thriller',
       'rating': 9.5,
       'votes': 2413272})]

**Most prolific actors using SQL aggregation**

``` python
Query.from_sqlite_query('shows.db', """
    SELECT p.name, COUNT(*) as show_count
    FROM people p JOIN stars s ON p.id = s.person_id
    GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()
```

<table>
<colgroup>
<col style="width: 50%" />
<col style="width: 50%" />
</colgroup>
<thead>
<tr>
<th>Step</th>
<th>What it does</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>JOIN stars s ON p.id = s.person_id</code></td>
<td>Links each person to the shows they starred in</td>
</tr>
<tr>
<td><code>COUNT(*) as show_count</code></td>
<td>Counts number of shows per person</td>
</tr>
<tr>
<td><code>GROUP BY p.id</code></td>
<td>Aggregates one row per person</td>
</tr>
<tr>
<td><code>ORDER BY show_count DESC LIMIT 10</code></td>
<td>Returns top 10 most-appearing actors</td>
</tr>
<tr>
<td><code>.show()</code></td>
<td>Pretty-prints the results</td>
</tr>
</tbody>
</table>

> **Tip:** This query uses SQL-side aggregation for performance — ideal
> when working with large tables. You can swap `stars` for `writers` to
> find the most prolific writers instead.

``` python
# Most prolific actors
Query.from_sqlite_query('../data/shows.db', """
    SELECT p.name, COUNT(*) as show_count
    FROM people p JOIN stars s ON p.id = s.person_id
    GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()
```

    name              | show_count
    ------------------------------
    Frank Welker      | 449       
    Dee Bradley Baker | 320       
    Grey DeLisle      | 312       
    Tom Kenny         | 300       
    Jeff Bennett      | 292       
