→ [1, 1, 3, 4, 5]

What is lazyq?

lazyq is a lightweight, chainable query pipeline for Python.

Instead of executing operations immediately, lazyq builds up a map of instructions and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets!

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/vikasAWA/lazyq.git

or from pypi

$ pip install lazyq

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

How to use

Usage

lazyq lets you build queries step by step. Let’s explore with some country data!

Datasets

scores = [
    {'name': 'Alice', 'age': 30, 'score': 85},
    {'name': 'Bob', 'age': 17, 'score': 92},
    {'name': 'Carol', 'age': 25, 'score': 60},
    {'name': 'Dave', 'age': 19, 'score': 74},
    {'name': 'Eve', 'age': 22, 'score': 91},
]
data = [
    {'dept': 'eng', 'name': 'Alice'},
    {'dept': 'hr', 'name': 'Bob'},
    {'dept': 'eng', 'name': 'Carol'},
    {'dept': 'hr', 'name': 'Dave'},
]
sales = [
    {'region': 'north', 'product': 'laptop', 'units': 10, 'price': 999},
    {'region': 'south', 'product': 'phone',  'units': 25, 'price': 499},
    {'region': 'north', 'product': 'phone',  'units': 15, 'price': 499},
    {'region': 'south', 'product': 'laptop', 'units': 8,  'price': 999},
    {'region': 'east',  'product': 'tablet', 'units': 20, 'price': 299},
    {'region': 'east',  'product': 'laptop', 'units': 5,  'price': 999},
    {'region': 'north', 'product': 'tablet', 'units': 12, 'price': 299},
]
movies = [
    {'title': 'Inception',      'genre': 'scifi',   'year': 2010, 'rating': 8.8, 'gross_m': 836},
    {'title': 'The Dark Knight','genre': 'action',  'year': 2008, 'rating': 9.0, 'gross_m': 1005},
    {'title': 'Interstellar',   'genre': 'scifi',   'year': 2014, 'rating': 8.6, 'gross_m': 773},
    {'title': 'Parasite',       'genre': 'thriller','year': 2019, 'rating': 8.6, 'gross_m': 258},
    {'title': 'Avengers',       'genre': 'action',  'year': 2019, 'rating': 8.4, 'gross_m': 2798},
    {'title': 'Joker',          'genre': 'thriller','year': 2019, 'rating': 8.4, 'gross_m': 1079},
    {'title': 'Dune',           'genre': 'scifi',   'year': 2021, 'rating': 8.0, 'gross_m': 401},
]

Examples

Listing all available Query operations

Query.tools()

A convenience method that prints a formatted reference of all available Query operations, grouped by category:

Category What it shows
📦 Data Sources Methods to load data from iterables, CSV, SQLite, JSON, YAML, and Excel
🔄 Transformations Row-level operations: map, filter, limit, sort, groupby
📊 Aggregations Summary operations: sum, count, max, min
📤 Output collect to get a list, show to pretty-print a table

Tip: Run Query.tools() any time you need a quick reminder of what’s available — no imports or docs needed.

Query.tools()

📦 Data Sources:

from_iterable(data)                 → Create Query from list/iterable
from_csv(path)                      → Load data from CSV file
from_sqlite(db, table)              → Load table from SQLite database
from_sqlite_query(db, query)        → Run custom SQL query on SQLite database
from_json(path, key=None, stream=False) → Load data from JSON file
from_yaml(path, key=None)           → Load data from YAML file
from_excel(path, sheet_name=None)   → Load data from Excel file

🔄 Transformations:

map(fn)                             → Transform each row
filter(fn)                          → Filter rows by condition
limit(n)                            → Limit number of rows
sort(key, reverse=False)            → Sort rows
groupby(key)                        → Group rows by a key

📊 Aggregations:

sum(by=None)                        → Sum values
count()                             → Count rows
max(by=None)                        → Maximum value/row
min(by=None)                        → Minimum value/row

📤 Output:

collect(n=None)                     → Return results as list
show(n=5)                           → Pretty print results
# plain numbers
Query.from_iterable([3, 1, 4, 1, 5]).sort().collect()
[1, 1, 3, 4, 5]

Chaining filters, sorts, and display

Filter rows by a condition, sort by a field, and preview results with .show().

Query.from_iterable(scores) \
    .filter(F('age') > 19) \
    .sort(key='score', reverse=True) \
    .show()
  • F('age') > 19 — keeps only rows where age is greater than 19
  • .sort(key='score', reverse=True) — sorts by score, highest first
  • .show() — pretty-prints the top 5 rows as a table
# chained
Query.from_iterable(scores)\
    .filter(F('age') > 19)\
    .sort(key='score', reverse=True)\
    .show()
name  | age | score
-------------------
Eve   | 22  | 91   
Alice | 30  | 85   
Carol | 25  | 60   

Grouping and aggregating

Use .groupby(key) followed by .sum(field) to total a numeric field across groups.

Query.from_iterable(sales).groupby('region').sum('units').collect()
# → [('north', 37.0), ('south', 33.0), ('east', 25.0)]
  • .groupby('region') — groups rows by the 'region' field
  • .sum('units') — sums the 'units' field within each group
  • .collect() — returns results as a list of (group_key, total) tuples
Query.from_iterable(sales).groupby('region').sum('units').collect()
[('north', 37.0), ('south', 33.0), ('east', 25.0)]

Top N results from a filtered, sorted dataset

Filter rows by a field value, sort by another field, limit results, and display as a table.

Query.from_iterable(movies) \
    .filter(F('genre') == 'scifi') \
    .sort('rating', reverse=True) \
    .limit(3) \
    .show()
Step What it does
filter(F('genre') == 'scifi') Keeps only rows where genre is 'scifi'
.sort('rating', reverse=True) Sorts by rating, highest first
.limit(3) Takes only the top 3 results
.show() Pretty-prints results as a table

Output:

title        | genre | year | rating | gross_m
----------------------------------------------
Inception    | scifi | 2010 | 8.8    | 836
Interstellar | scifi | 2014 | 8.6    | 773
Dune         | scifi | 2021 | 8.0    | 401

Tip: You can swap 'scifi' for 'action' or 'thriller', change the sort field to 'gross_m', or adjust limit(3) to get different slices of your data.


Query.from_iterable(movies) \
    .filter(F('genre') == 'scifi') \
    .sort('rating', reverse=True) \
    .limit(3) \
    .show()
title        | genre | year | rating | gross_m
----------------------------------------------
Inception    | scifi | 2010 | 8.8    | 836    
Interstellar | scifi | 2014 | 8.6    | 773    
Dune         | scifi | 2021 | 8.0    | 401    

Loading and previewing data from a SQLite database

Use Query.from_sqlite() to read directly from a SQLite table and preview the results.

Query.from_sqlite('shows.db', 'shows').show()
Step What it does
from_sqlite('shows.db', 'shows') Connects to shows.db and reads all rows from the shows table
.show() Pretty-prints the first 5 rows as a table

Output:

id    | title                       | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa                 | 1981 | 227
63881 | Catweazle                   | 1970 | 26
...

Tip: You can chain further operations after from_sqlite(), for example:

Query.from_sqlite('shows.db', 'shows') \
    .filter(F('episodes').not_null()) \
    .filter(F('year') > 1975) \
    .sort('episodes', reverse=True) \
    .show()

For custom SQL, use Query.from_sqlite_query(db, query) instead.


shows_table = Query.from_sqlite('../data/shows.db', 'shows')
shows_table.show()
id    | title                       | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa                 | 1981 | 227     
63881 | Catweazle                   | 1970 | 26      
63962 | UFO                         | 1970 | 26      
65269 | Ace of Wands                | 1970 | 46      
65270 | The Adventures of Don Quick | 1970 | 6       
shows_table \
    .filter(F('year').not_null()) \
    .filter(F('episodes').not_null()) \
    .show()
id    | title                       | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa                 | 1981 | 227     
63881 | Catweazle                   | 1970 | 26      
63962 | UFO                         | 1970 | 26      
65269 | Ace of Wands                | 1970 | 46      
65270 | The Adventures of Don Quick | 1970 | 6       

Finding the top N rows by a numeric field

Filter out nulls, sort descending, and limit to get the top results.

Query.from_sqlite('shows.db', 'shows') \
    .filter(F('episodes').not_null()) \
    .sort(key='episodes', reverse=True) \
    .limit(5) \
    .show()
Step What it does
from_sqlite('shows.db', 'shows') Reads all rows from the shows table
.filter(F('episodes').not_null()) Excludes rows where episodes is None/missing
.sort(key='episodes', reverse=True) Sorts by episode count, highest first
.limit(5) Keeps only the top 5 results
.show() Pretty-prints as a table

Output:

id       | title                      | year | episodes
-------------------------------------------------------
12164062 | NRK Nyheter                | 2007 | 18593
69658    | The Young and the Restless | 1973 | 13297
...

Tip: Replace 'episodes' with any numeric field (e.g. 'year') and adjust limit(5) to get different top-N results.


# Top 5 shows by episodes
shows_table\
    .filter(F('episodes').not_null())\
    .sort(key='episodes', reverse=True)\
    .limit(5).show()
id       | title                      | year | episodes
-------------------------------------------------------
12164062 | NRK Nyheter                | 2007 | 18593   
69658    | The Young and the Restless | 1973 | 13297   
988827   | See the World by Train     | 1987 | 10674   
33062164 | WREG News 3 at 10PM        | 1975 | 10663   
175383   | Barátok közt               | 1998 | 10456   

Counting records per group, sorted by frequency

Query.from_sqlite('shows.db', 'shows') \
    .filter(F('year').not_null()) \
    .groupby('year').count() \
    .sort(key=1, reverse=True) \
    .collect(7)
Step What it does
.filter(F('year').not_null()) Drops rows with no year
.groupby('year').count() Counts shows per year, yielding (year, count) tuples
.sort(key=1, reverse=True) Sorts by count (index 1), highest first
.collect(7) Returns top 7 results as a list

Tip: key=1 refers to the second element of the tuple (the count). Use key=0 to sort alphabetically/chronologically by the group key instead.

# Count shows per year
shows_table\
    .filter(F('year').not_null())\
    .groupby('year').count().sort(key=1, reverse=True).collect(7)
[(2020, 12705),
 (2021, 12241),
 (2018, 11646),
 (2017, 11591),
 (2019, 11534),
 (2016, 11339),
 (2015, 11121)]
print("=== Sort ===")
Query.from_iterable(scores).sort(key='age').show()

print("\n=== GroupedQuery count ===")
print(Query.from_iterable(movies).groupby('genre').count().collect())

print("\n=== GroupedQuery max ===")
print(Query.from_iterable(movies).groupby('genre').max(by='rating').collect())

print("\n=== Double iteration ===")
q = Query.from_iterable(scores).limit(3)
print(q.collect())
print(q.collect())
=== Sort ===
name  | age | score
-------------------
Bob   | 17  | 92   
Dave  | 19  | 74   
Eve   | 22  | 91   
Carol | 25  | 60   
Alice | 30  | 85   

=== GroupedQuery count ===
[('scifi', 3), ('action', 2), ('thriller', 2)]

=== GroupedQuery max ===
[('scifi', {'title': 'Inception', 'genre': 'scifi', 'year': 2010, 'rating': 8.8, 'gross_m': 836}), ('action', {'title': 'The Dark Knight', 'genre': 'action', 'year': 2008, 'rating': 9.0, 'gross_m': 1005}), ('thriller', {'title': 'Parasite', 'genre': 'thriller', 'year': 2019, 'rating': 8.6, 'gross_m': 258})]

=== Double iteration ===
[{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}]
[{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}]

Top shows by episode count, grouped by year

Step What it does
.filter(F('episodes').not_null()) Skips shows with no episode count
.groupby('year').max(by='episodes') Finds the most-episodes show per year, yielding (year, row) tuples
.sort(key=lambda x: x[1]['episodes'], reverse=True) Sorts groups by their max episode count, highest first
.collect(4) Returns the top 4 results as a list

Tip: The lambda x: x[1]['episodes'] key drills into the nested row dict inside each (year, row) tuple. Use key=lambda x: x[0] to sort by year instead.

#Top shows by episode count, grouped by year
shows_table\
    .filter(F('episodes').not_null()) \
    .groupby('year').max(by='episodes') \
    .sort(key=lambda x: x[1]['episodes'], reverse=True) \
    .collect(5)
[(2007,
  {'id': 12164062, 'title': 'NRK Nyheter', 'year': 2007, 'episodes': 18593}),
 (1973,
  {'id': 69658,
   'title': 'The Young and the Restless',
   'year': 1973,
   'episodes': 13297}),
 (1987,
  {'id': 988827,
   'title': 'See the World by Train',
   'year': 1987,
   'episodes': 10674}),
 (1975,
  {'id': 33062164,
   'title': 'WREG News 3 at 10PM',
   'year': 1975,
   'episodes': 10663}),
 (1998,
  {'id': 175383, 'title': 'Barátok közt', 'year': 1998, 'episodes': 10456})]
#total episodes per year (top 5 years)
shows_table\
    .filter(F('episodes').not_null())\
    .groupby('year').sum(by='episodes').sort(key=1, reverse=True).limit(5).collect()
[(2012, 324658.0),
 (2011, 313426.0),
 (2017, 301113.0),
 (2013, 287416.0),
 (2014, 276226.0)]

Listing all tables in a SQLite database

Query.from_sqlite_query('shows.db',
    "SELECT name FROM sqlite_master WHERE type='table'"
).collect()
Step What it does
from_sqlite_query(db, sql) Runs a raw SQL query instead of reading a full table
.collect() Returns all results as a list of dicts

Tip: Use from_sqlite_query() whenever you need custom SQL — joins, subqueries, or schema inspection like this example.

tables = Query.from_sqlite_query('../data/shows.db', "SELECT name FROM sqlite_master WHERE type='table'").collect()
tables
[{'name': 'genres'},
 {'name': 'people'},
 {'name': 'ratings'},
 {'name': 'shows'},
 {'name': 'stars'},
 {'name': 'writers'}]
ratings = Query.from_sqlite('../data/shows.db', 'ratings')
ratings.show()
show_id | rating | votes
------------------------
62614   | 6.7    | 392  
63881   | 7.9    | 1224 
63962   | 7.9    | 4518 
65269   | 7.5    | 141  
65270   | 7.5    | 33   

Joining tables with custom SQL

joined = Query.from_sqlite_query('shows.db', """
    SELECT s.title, s.year, s.episodes, r.rating, r.votes
    FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()
Step What it does
from_sqlite_query(db, sql) Runs a SQL JOIN across shows and ratings tables
ON s.id = r.show_id Matches each show to its rating by ID
.show() Previews the first 5 rows of the merged result

Tip: Once joined, you can chain any Query operations — e.g. .filter(F('rating') > 8.0).sort('votes', reverse=True) to find highly-rated, well-known shows.

joined = Query.from_sqlite_query('../data/shows.db', """
    SELECT s.title, s.year, s.episodes, r.rating, r.votes
    FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()
title                       | year | episodes | rating | votes
--------------------------------------------------------------
Zeg 'ns Aaa                 | 1981 | 227      | 6.7    | 392  
Catweazle                   | 1970 | 26       | 7.9    | 1224 
UFO                         | 1970 | 26       | 7.9    | 4518 
Ace of Wands                | 1970 | 46       | 7.5    | 141  
The Adventures of Don Quick | 1970 | 6        | 7.5    | 33   
# Top 10 highest rated shows (with at least 1000 votes)
joined.filter(F('votes') > 1000).sort(key='rating', reverse=True).limit(10).show(10)
title                      | year | episodes | rating | votes  
---------------------------------------------------------------
Coke Studio Pakistan       | 2008 | 109      | 9.6    | 1373   
Amrutham                   | 2001 | 61       | 9.6    | 1029   
Choufli Hal                | 2005 | 135      | 9.6    | 3265   
Geografens testamente      | 2011 | 38       | 9.6    | 1040   
Friday Five Sharp          | 2015 | 74       | 9.6    | 4571   
Breaking Bad               | 2008 | 62       | 9.5    | 2413272
Call Me Kevin              | 2016 | 3        | 9.5    | 1949   
The Why Files              | 2020 | 210      | 9.5    | 1142   
Upanishad Ganga            | 2012 | 50       | 9.5    | 1265   
The Surrealists' Top Chart | 1984 | 33       | 9.4    | 1787   

Most voted shows with episode data

joined \
    .filter(F('episodes').not_null()) \
    .sort(key='votes', reverse=True) \
    .limit(5) \
    .show()
Step What it does
.filter(F('episodes').not_null()) Excludes shows with missing episode counts
.sort(key='votes', reverse=True) Sorts by vote count, most voted first
.limit(5) Keeps the top 5 results
.show() Pretty-prints as a table

Tip: Combine with a rating filter — e.g. .filter(F('votes') > 500000 & F('rating') > 8.5) — to find both massively popular and critically acclaimed shows.

# Most voted shows
joined.filter(F('episodes').not_null()).sort(key='votes', reverse=True).limit(5).show()
title            | year | episodes | rating | votes  
-----------------------------------------------------
Game of Thrones  | 2011 | 74       | 9.2    | 2491529
Breaking Bad     | 2008 | 62       | 9.5    | 2413272
Stranger Things  | 2016 | 42       | 8.6    | 1493344
The Walking Dead | 2010 | 177      | 8.1    | 1165465
Friends          | 1994 | 235      | 8.9    | 1164587
for t in tables:
    print(f"\n=== {t['name']} ===")
    Query.from_sqlite('../data/shows.db', t['name']).show(3)

=== genres ===
show_id | genre    
-------------------
62614   | Comedy   
63881   | Adventure
63881   | Comedy   

=== people ===
id | name            | birth
----------------------------
1  | Fred Astaire    | 1899 
2  | Lauren Bacall   | 1924 
3  | Brigitte Bardot | 1934 

=== ratings ===
show_id | rating | votes
------------------------
62614   | 6.7    | 392  
63881   | 7.9    | 1224 
63962   | 7.9    | 4518 

=== shows ===
id    | title       | year | episodes
-------------------------------------
62614 | Zeg 'ns Aaa | 1981 | 227     
63881 | Catweazle   | 1970 | 26      
63962 | UFO         | 1970 | 26      

=== stars ===
show_id | person_id
-------------------
62614   | 853892   
62614   | 393936   
62614   | 393936   

=== writers ===
show_id | person_id
-------------------
62614   | 887216   
62614   | 688754   
63881   | 139441   

Best rated show per genre

Query.from_sqlite_query('shows.db', sql_q)
    .filter(F('votes') > 1000)
    .groupby('genre').max(by='rating')
    .collect()

Think of this like sorting a pile of movies into genre buckets (Action, Comedy, Drama…), then picking the highest-rated show from each bucket. We also require at least 1,000 votes so obscure shows don’t sneak in with a perfect score from 5 friends.


# joining shows, ratings and genre on show_id
sql_q ="""
    SELECT s.title, s.year, g.genre, r.rating, r.votes
    FROM shows s 
    JOIN ratings r ON s.id = r.show_id
    JOIN genres g ON s.id = g.show_id
"""
# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
   .groupby('genre').max(by='rating').collect(3)
[('Adventure',
  {'title': 'Geografens testamente',
   'year': 2011,
   'genre': 'Adventure',
   'rating': 9.6,
   'votes': 1040}),
 ('Comedy',
  {'title': 'Amrutham',
   'year': 2001,
   'genre': 'Comedy',
   'rating': 9.6,
   'votes': 1029}),
 ('Family',
  {'title': "De'ah Da'iah",
   'year': 2008,
   'genre': 'Family',
   'rating': 9.4,
   'votes': 1208})]

Same query, but sorted alphabetically (Z → A)

...max(by='rating').sort(reverse=True).collect()

This is identical to the query above, with one addition: .sort(reverse=True) arranges the results alphabetically by genre name, from Z to A (Western → Action). Without it, genres appear in the order they were first encountered in the data.

# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
   .groupby('genre').max(by='rating').sort(reverse=True).collect(3) # short alphbatically in reverse order
[('Western',
  {'title': 'Yellowstone',
   'year': 2018,
   'genre': 'Western',
   'rating': 8.6,
   'votes': 291730}),
 ('War',
  {'title': 'The Palestinian Alienation',
   'year': 2004,
   'genre': 'War',
   'rating': 9.4,
   'votes': 1589}),
 ('Thriller',
  {'title': 'Breaking Bad',
   'year': 2008,
   'genre': 'Thriller',
   'rating': 9.5,
   'votes': 2413272})]

Most prolific actors using SQL aggregation

Query.from_sqlite_query('shows.db', """
    SELECT p.name, COUNT(*) as show_count
    FROM people p JOIN stars s ON p.id = s.person_id
    GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()
Step What it does
JOIN stars s ON p.id = s.person_id Links each person to the shows they starred in
COUNT(*) as show_count Counts number of shows per person
GROUP BY p.id Aggregates one row per person
ORDER BY show_count DESC LIMIT 10 Returns top 10 most-appearing actors
.show() Pretty-prints the results

Tip: This query uses SQL-side aggregation for performance — ideal when working with large tables. You can swap stars for writers to find the most prolific writers instead.

# Most prolific actors
Query.from_sqlite_query('../data/shows.db', """
    SELECT p.name, COUNT(*) as show_count
    FROM people p JOIN stars s ON p.id = s.person_id
    GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()
name              | show_count
------------------------------
Frank Welker      | 449       
Dee Bradley Baker | 320       
Grey DeLisle      | 312       
Tom Kenny         | 300       
Jeff Bennett      | 292