scores = [
{'name': 'Alice', 'age': 30, 'score': 85},
{'name': 'Bob', 'age': 17, 'score': 92},
{'name': 'Carol', 'age': 25, 'score': 60},
{'name': 'Dave', 'age': 19, 'score': 74},
{'name': 'Eve', 'age': 22, 'score': 91},
]
data = [
{'dept': 'eng', 'name': 'Alice'},
{'dept': 'hr', 'name': 'Bob'},
{'dept': 'eng', 'name': 'Carol'},
{'dept': 'hr', 'name': 'Dave'},
]
sales = [
{'region': 'north', 'product': 'laptop', 'units': 10, 'price': 999},
{'region': 'south', 'product': 'phone', 'units': 25, 'price': 499},
{'region': 'north', 'product': 'phone', 'units': 15, 'price': 499},
{'region': 'south', 'product': 'laptop', 'units': 8, 'price': 999},
{'region': 'east', 'product': 'tablet', 'units': 20, 'price': 299},
{'region': 'east', 'product': 'laptop', 'units': 5, 'price': 999},
{'region': 'north', 'product': 'tablet', 'units': 12, 'price': 299},
]
movies = [
{'title': 'Inception', 'genre': 'scifi', 'year': 2010, 'rating': 8.8, 'gross_m': 836},
{'title': 'The Dark Knight','genre': 'action', 'year': 2008, 'rating': 9.0, 'gross_m': 1005},
{'title': 'Interstellar', 'genre': 'scifi', 'year': 2014, 'rating': 8.6, 'gross_m': 773},
{'title': 'Parasite', 'genre': 'thriller','year': 2019, 'rating': 8.6, 'gross_m': 258},
{'title': 'Avengers', 'genre': 'action', 'year': 2019, 'rating': 8.4, 'gross_m': 2798},
{'title': 'Joker', 'genre': 'thriller','year': 2019, 'rating': 8.4, 'gross_m': 1079},
{'title': 'Dune', 'genre': 'scifi', 'year': 2021, 'rating': 8.0, 'gross_m': 401},
]→ [1, 1, 3, 4, 5]
What is lazyq?
lazyq is a lightweight, chainable query pipeline for Python.
Instead of executing operations immediately, lazyq builds up a map of instructions and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets!
Usage
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/vikasAWA/lazyq.gitor from pypi
$ pip install lazyqDocumentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
How to use
Usage
lazyq lets you build queries step by step. Let’s explore with some country data!
Datasets
Examples
Listing all available Query operations
Query.tools()A convenience method that prints a formatted reference of all available Query operations, grouped by category:
| Category | What it shows |
|---|---|
| 📦 Data Sources | Methods to load data from iterables, CSV, SQLite, JSON, YAML, and Excel |
| 🔄 Transformations | Row-level operations: map, filter, limit, sort, groupby |
| 📊 Aggregations | Summary operations: sum, count, max, min |
| 📤 Output | collect to get a list, show to pretty-print a table |
Tip: Run
Query.tools()any time you need a quick reminder of what’s available — no imports or docs needed.
Query.tools()
📦 Data Sources:
from_iterable(data) → Create Query from list/iterable
from_csv(path) → Load data from CSV file
from_sqlite(db, table) → Load table from SQLite database
from_sqlite_query(db, query) → Run custom SQL query on SQLite database
from_json(path, key=None, stream=False) → Load data from JSON file
from_yaml(path, key=None) → Load data from YAML file
from_excel(path, sheet_name=None) → Load data from Excel file
🔄 Transformations:
map(fn) → Transform each row
filter(fn) → Filter rows by condition
limit(n) → Limit number of rows
sort(key, reverse=False) → Sort rows
groupby(key) → Group rows by a key
📊 Aggregations:
sum(by=None) → Sum values
count() → Count rows
max(by=None) → Maximum value/row
min(by=None) → Minimum value/row
📤 Output:
collect(n=None) → Return results as list
show(n=5) → Pretty print results
# plain numbers
Query.from_iterable([3, 1, 4, 1, 5]).sort().collect()[1, 1, 3, 4, 5]
Chaining filters, sorts, and display
Filter rows by a condition, sort by a field, and preview results with .show().
Query.from_iterable(scores) \
.filter(F('age') > 19) \
.sort(key='score', reverse=True) \
.show()F('age') > 19— keeps only rows whereageis greater than 19.sort(key='score', reverse=True)— sorts byscore, highest first.show()— pretty-prints the top 5 rows as a table
# chained
Query.from_iterable(scores)\
.filter(F('age') > 19)\
.sort(key='score', reverse=True)\
.show()name | age | score
-------------------
Eve | 22 | 91
Alice | 30 | 85
Carol | 25 | 60
Grouping and aggregating
Use .groupby(key) followed by .sum(field) to total a numeric field across groups.
Query.from_iterable(sales).groupby('region').sum('units').collect()
# → [('north', 37.0), ('south', 33.0), ('east', 25.0)].groupby('region')— groups rows by the'region'field.sum('units')— sums the'units'field within each group.collect()— returns results as a list of(group_key, total)tuples
Query.from_iterable(sales).groupby('region').sum('units').collect()[('north', 37.0), ('south', 33.0), ('east', 25.0)]
Top N results from a filtered, sorted dataset
Filter rows by a field value, sort by another field, limit results, and display as a table.
Query.from_iterable(movies) \
.filter(F('genre') == 'scifi') \
.sort('rating', reverse=True) \
.limit(3) \
.show()| Step | What it does |
|---|---|
filter(F('genre') == 'scifi') |
Keeps only rows where genre is 'scifi' |
.sort('rating', reverse=True) |
Sorts by rating, highest first |
.limit(3) |
Takes only the top 3 results |
.show() |
Pretty-prints results as a table |
Output:
title | genre | year | rating | gross_m
----------------------------------------------
Inception | scifi | 2010 | 8.8 | 836
Interstellar | scifi | 2014 | 8.6 | 773
Dune | scifi | 2021 | 8.0 | 401
Tip: You can swap
'scifi'for'action'or'thriller', change the sort field to'gross_m', or adjustlimit(3)to get different slices of your data.
Query.from_iterable(movies) \
.filter(F('genre') == 'scifi') \
.sort('rating', reverse=True) \
.limit(3) \
.show()title | genre | year | rating | gross_m
----------------------------------------------
Inception | scifi | 2010 | 8.8 | 836
Interstellar | scifi | 2014 | 8.6 | 773
Dune | scifi | 2021 | 8.0 | 401
Loading and previewing data from a SQLite database
Use Query.from_sqlite() to read directly from a SQLite table and preview the results.
Query.from_sqlite('shows.db', 'shows').show()| Step | What it does |
|---|---|
from_sqlite('shows.db', 'shows') |
Connects to shows.db and reads all rows from the shows table |
.show() |
Pretty-prints the first 5 rows as a table |
Output:
id | title | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa | 1981 | 227
63881 | Catweazle | 1970 | 26
...
Tip: You can chain further operations after
from_sqlite(), for example:Query.from_sqlite('shows.db', 'shows') \ .filter(F('episodes').not_null()) \ .filter(F('year') > 1975) \ .sort('episodes', reverse=True) \ .show()For custom SQL, use
Query.from_sqlite_query(db, query)instead.
shows_table = Query.from_sqlite('../data/shows.db', 'shows')
shows_table.show()id | title | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa | 1981 | 227
63881 | Catweazle | 1970 | 26
63962 | UFO | 1970 | 26
65269 | Ace of Wands | 1970 | 46
65270 | The Adventures of Don Quick | 1970 | 6
shows_table \
.filter(F('year').not_null()) \
.filter(F('episodes').not_null()) \
.show()id | title | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa | 1981 | 227
63881 | Catweazle | 1970 | 26
63962 | UFO | 1970 | 26
65269 | Ace of Wands | 1970 | 46
65270 | The Adventures of Don Quick | 1970 | 6
Finding the top N rows by a numeric field
Filter out nulls, sort descending, and limit to get the top results.
Query.from_sqlite('shows.db', 'shows') \
.filter(F('episodes').not_null()) \
.sort(key='episodes', reverse=True) \
.limit(5) \
.show()| Step | What it does |
|---|---|
from_sqlite('shows.db', 'shows') |
Reads all rows from the shows table |
.filter(F('episodes').not_null()) |
Excludes rows where episodes is None/missing |
.sort(key='episodes', reverse=True) |
Sorts by episode count, highest first |
.limit(5) |
Keeps only the top 5 results |
.show() |
Pretty-prints as a table |
Output:
id | title | year | episodes
-------------------------------------------------------
12164062 | NRK Nyheter | 2007 | 18593
69658 | The Young and the Restless | 1973 | 13297
...
Tip: Replace
'episodes'with any numeric field (e.g.'year') and adjustlimit(5)to get different top-N results.
# Top 5 shows by episodes
shows_table\
.filter(F('episodes').not_null())\
.sort(key='episodes', reverse=True)\
.limit(5).show()id | title | year | episodes
-------------------------------------------------------
12164062 | NRK Nyheter | 2007 | 18593
69658 | The Young and the Restless | 1973 | 13297
988827 | See the World by Train | 1987 | 10674
33062164 | WREG News 3 at 10PM | 1975 | 10663
175383 | Barátok közt | 1998 | 10456
Counting records per group, sorted by frequency
Query.from_sqlite('shows.db', 'shows') \
.filter(F('year').not_null()) \
.groupby('year').count() \
.sort(key=1, reverse=True) \
.collect(7)| Step | What it does |
|---|---|
.filter(F('year').not_null()) |
Drops rows with no year |
.groupby('year').count() |
Counts shows per year, yielding (year, count) tuples |
.sort(key=1, reverse=True) |
Sorts by count (index 1), highest first |
.collect(7) |
Returns top 7 results as a list |
Tip:
key=1refers to the second element of the tuple (the count). Usekey=0to sort alphabetically/chronologically by the group key instead.
# Count shows per year
shows_table\
.filter(F('year').not_null())\
.groupby('year').count().sort(key=1, reverse=True).collect(7)[(2020, 12705),
(2021, 12241),
(2018, 11646),
(2017, 11591),
(2019, 11534),
(2016, 11339),
(2015, 11121)]
print("=== Sort ===")
Query.from_iterable(scores).sort(key='age').show()
print("\n=== GroupedQuery count ===")
print(Query.from_iterable(movies).groupby('genre').count().collect())
print("\n=== GroupedQuery max ===")
print(Query.from_iterable(movies).groupby('genre').max(by='rating').collect())
print("\n=== Double iteration ===")
q = Query.from_iterable(scores).limit(3)
print(q.collect())
print(q.collect())=== Sort ===
name | age | score
-------------------
Bob | 17 | 92
Dave | 19 | 74
Eve | 22 | 91
Carol | 25 | 60
Alice | 30 | 85
=== GroupedQuery count ===
[('scifi', 3), ('action', 2), ('thriller', 2)]
=== GroupedQuery max ===
[('scifi', {'title': 'Inception', 'genre': 'scifi', 'year': 2010, 'rating': 8.8, 'gross_m': 836}), ('action', {'title': 'The Dark Knight', 'genre': 'action', 'year': 2008, 'rating': 9.0, 'gross_m': 1005}), ('thriller', {'title': 'Parasite', 'genre': 'thriller', 'year': 2019, 'rating': 8.6, 'gross_m': 258})]
=== Double iteration ===
[{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}]
[{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}]
Top shows by episode count, grouped by year
| Step | What it does |
|---|---|
.filter(F('episodes').not_null()) |
Skips shows with no episode count |
.groupby('year').max(by='episodes') |
Finds the most-episodes show per year, yielding (year, row) tuples |
.sort(key=lambda x: x[1]['episodes'], reverse=True) |
Sorts groups by their max episode count, highest first |
.collect(4) |
Returns the top 4 results as a list |
Tip: The
lambda x: x[1]['episodes']key drills into the nested row dict inside each(year, row)tuple. Usekey=lambda x: x[0]to sort by year instead.
#Top shows by episode count, grouped by year
shows_table\
.filter(F('episodes').not_null()) \
.groupby('year').max(by='episodes') \
.sort(key=lambda x: x[1]['episodes'], reverse=True) \
.collect(5)[(2007,
{'id': 12164062, 'title': 'NRK Nyheter', 'year': 2007, 'episodes': 18593}),
(1973,
{'id': 69658,
'title': 'The Young and the Restless',
'year': 1973,
'episodes': 13297}),
(1987,
{'id': 988827,
'title': 'See the World by Train',
'year': 1987,
'episodes': 10674}),
(1975,
{'id': 33062164,
'title': 'WREG News 3 at 10PM',
'year': 1975,
'episodes': 10663}),
(1998,
{'id': 175383, 'title': 'Barátok közt', 'year': 1998, 'episodes': 10456})]
#total episodes per year (top 5 years)
shows_table\
.filter(F('episodes').not_null())\
.groupby('year').sum(by='episodes').sort(key=1, reverse=True).limit(5).collect()[(2012, 324658.0),
(2011, 313426.0),
(2017, 301113.0),
(2013, 287416.0),
(2014, 276226.0)]
Listing all tables in a SQLite database
Query.from_sqlite_query('shows.db',
"SELECT name FROM sqlite_master WHERE type='table'"
).collect()| Step | What it does |
|---|---|
from_sqlite_query(db, sql) |
Runs a raw SQL query instead of reading a full table |
.collect() |
Returns all results as a list of dicts |
Tip: Use
from_sqlite_query()whenever you need custom SQL — joins, subqueries, or schema inspection like this example.
tables = Query.from_sqlite_query('../data/shows.db', "SELECT name FROM sqlite_master WHERE type='table'").collect()
tables[{'name': 'genres'},
{'name': 'people'},
{'name': 'ratings'},
{'name': 'shows'},
{'name': 'stars'},
{'name': 'writers'}]
ratings = Query.from_sqlite('../data/shows.db', 'ratings')
ratings.show()show_id | rating | votes
------------------------
62614 | 6.7 | 392
63881 | 7.9 | 1224
63962 | 7.9 | 4518
65269 | 7.5 | 141
65270 | 7.5 | 33
Joining tables with custom SQL
joined = Query.from_sqlite_query('shows.db', """
SELECT s.title, s.year, s.episodes, r.rating, r.votes
FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()| Step | What it does |
|---|---|
from_sqlite_query(db, sql) |
Runs a SQL JOIN across shows and ratings tables |
ON s.id = r.show_id |
Matches each show to its rating by ID |
.show() |
Previews the first 5 rows of the merged result |
Tip: Once joined, you can chain any Query operations — e.g.
.filter(F('rating') > 8.0).sort('votes', reverse=True)to find highly-rated, well-known shows.
joined = Query.from_sqlite_query('../data/shows.db', """
SELECT s.title, s.year, s.episodes, r.rating, r.votes
FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()title | year | episodes | rating | votes
--------------------------------------------------------------
Zeg 'ns Aaa | 1981 | 227 | 6.7 | 392
Catweazle | 1970 | 26 | 7.9 | 1224
UFO | 1970 | 26 | 7.9 | 4518
Ace of Wands | 1970 | 46 | 7.5 | 141
The Adventures of Don Quick | 1970 | 6 | 7.5 | 33
# Top 10 highest rated shows (with at least 1000 votes)
joined.filter(F('votes') > 1000).sort(key='rating', reverse=True).limit(10).show(10)title | year | episodes | rating | votes
---------------------------------------------------------------
Coke Studio Pakistan | 2008 | 109 | 9.6 | 1373
Amrutham | 2001 | 61 | 9.6 | 1029
Choufli Hal | 2005 | 135 | 9.6 | 3265
Geografens testamente | 2011 | 38 | 9.6 | 1040
Friday Five Sharp | 2015 | 74 | 9.6 | 4571
Breaking Bad | 2008 | 62 | 9.5 | 2413272
Call Me Kevin | 2016 | 3 | 9.5 | 1949
The Why Files | 2020 | 210 | 9.5 | 1142
Upanishad Ganga | 2012 | 50 | 9.5 | 1265
The Surrealists' Top Chart | 1984 | 33 | 9.4 | 1787
Most voted shows with episode data
joined \
.filter(F('episodes').not_null()) \
.sort(key='votes', reverse=True) \
.limit(5) \
.show()| Step | What it does |
|---|---|
.filter(F('episodes').not_null()) |
Excludes shows with missing episode counts |
.sort(key='votes', reverse=True) |
Sorts by vote count, most voted first |
.limit(5) |
Keeps the top 5 results |
.show() |
Pretty-prints as a table |
Tip: Combine with a rating filter — e.g.
.filter(F('votes') > 500000 & F('rating') > 8.5)— to find both massively popular and critically acclaimed shows.
# Most voted shows
joined.filter(F('episodes').not_null()).sort(key='votes', reverse=True).limit(5).show()title | year | episodes | rating | votes
-----------------------------------------------------
Game of Thrones | 2011 | 74 | 9.2 | 2491529
Breaking Bad | 2008 | 62 | 9.5 | 2413272
Stranger Things | 2016 | 42 | 8.6 | 1493344
The Walking Dead | 2010 | 177 | 8.1 | 1165465
Friends | 1994 | 235 | 8.9 | 1164587
for t in tables:
print(f"\n=== {t['name']} ===")
Query.from_sqlite('../data/shows.db', t['name']).show(3)
=== genres ===
show_id | genre
-------------------
62614 | Comedy
63881 | Adventure
63881 | Comedy
=== people ===
id | name | birth
----------------------------
1 | Fred Astaire | 1899
2 | Lauren Bacall | 1924
3 | Brigitte Bardot | 1934
=== ratings ===
show_id | rating | votes
------------------------
62614 | 6.7 | 392
63881 | 7.9 | 1224
63962 | 7.9 | 4518
=== shows ===
id | title | year | episodes
-------------------------------------
62614 | Zeg 'ns Aaa | 1981 | 227
63881 | Catweazle | 1970 | 26
63962 | UFO | 1970 | 26
=== stars ===
show_id | person_id
-------------------
62614 | 853892
62614 | 393936
62614 | 393936
=== writers ===
show_id | person_id
-------------------
62614 | 887216
62614 | 688754
63881 | 139441
Best rated show per genre
Query.from_sqlite_query('shows.db', sql_q)
.filter(F('votes') > 1000)
.groupby('genre').max(by='rating')
.collect()Think of this like sorting a pile of movies into genre buckets (Action, Comedy, Drama…), then picking the highest-rated show from each bucket. We also require at least 1,000 votes so obscure shows don’t sneak in with a perfect score from 5 friends.
# joining shows, ratings and genre on show_id
sql_q ="""
SELECT s.title, s.year, g.genre, r.rating, r.votes
FROM shows s
JOIN ratings r ON s.id = r.show_id
JOIN genres g ON s.id = g.show_id
"""# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
.groupby('genre').max(by='rating').collect(3)[('Adventure',
{'title': 'Geografens testamente',
'year': 2011,
'genre': 'Adventure',
'rating': 9.6,
'votes': 1040}),
('Comedy',
{'title': 'Amrutham',
'year': 2001,
'genre': 'Comedy',
'rating': 9.6,
'votes': 1029}),
('Family',
{'title': "De'ah Da'iah",
'year': 2008,
'genre': 'Family',
'rating': 9.4,
'votes': 1208})]
Same query, but sorted alphabetically (Z → A)
...max(by='rating').sort(reverse=True).collect()This is identical to the query above, with one addition: .sort(reverse=True) arranges the results alphabetically by genre name, from Z to A (Western → Action). Without it, genres appear in the order they were first encountered in the data.
# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
.groupby('genre').max(by='rating').sort(reverse=True).collect(3) # short alphbatically in reverse order[('Western',
{'title': 'Yellowstone',
'year': 2018,
'genre': 'Western',
'rating': 8.6,
'votes': 291730}),
('War',
{'title': 'The Palestinian Alienation',
'year': 2004,
'genre': 'War',
'rating': 9.4,
'votes': 1589}),
('Thriller',
{'title': 'Breaking Bad',
'year': 2008,
'genre': 'Thriller',
'rating': 9.5,
'votes': 2413272})]
Most prolific actors using SQL aggregation
Query.from_sqlite_query('shows.db', """
SELECT p.name, COUNT(*) as show_count
FROM people p JOIN stars s ON p.id = s.person_id
GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()| Step | What it does |
|---|---|
JOIN stars s ON p.id = s.person_id |
Links each person to the shows they starred in |
COUNT(*) as show_count |
Counts number of shows per person |
GROUP BY p.id |
Aggregates one row per person |
ORDER BY show_count DESC LIMIT 10 |
Returns top 10 most-appearing actors |
.show() |
Pretty-prints the results |
Tip: This query uses SQL-side aggregation for performance — ideal when working with large tables. You can swap
starsforwritersto find the most prolific writers instead.
# Most prolific actors
Query.from_sqlite_query('../data/shows.db', """
SELECT p.name, COUNT(*) as show_count
FROM people p JOIN stars s ON p.id = s.person_id
GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()name | show_count
------------------------------
Frank Welker | 449
Dee Bradley Baker | 320
Grey DeLisle | 312
Tom Kenny | 300
Jeff Bennett | 292