lazyq

What is lazyq?

lazyq is a lightweight, chainable query pipeline for Python.

Instead of executing operations immediately, lazyq builds up a map of instructions and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets!

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/vikasAWA/lazyq.git

or from pypi

$ pip install lazyq

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

How to use

Usage

lazyq lets you build queries step by step. Let’s explore with some country data!

Datasets

scores = [
    {'name': 'Alice', 'age': 30, 'score': 85},
    {'name': 'Bob', 'age': 17, 'score': 92},
    {'name': 'Carol', 'age': 25, 'score': 60},
    {'name': 'Dave', 'age': 19, 'score': 74},
    {'name': 'Eve', 'age': 22, 'score': 91},
]
data = [
    {'dept': 'eng', 'name': 'Alice'},
    {'dept': 'hr', 'name': 'Bob'},
    {'dept': 'eng', 'name': 'Carol'},
    {'dept': 'hr', 'name': 'Dave'},
]
sales = [
    {'region': 'north', 'product': 'laptop', 'units': 10, 'price': 999},
    {'region': 'south', 'product': 'phone',  'units': 25, 'price': 499},
    {'region': 'north', 'product': 'phone',  'units': 15, 'price': 499},
    {'region': 'south', 'product': 'laptop', 'units': 8,  'price': 999},
    {'region': 'east',  'product': 'tablet', 'units': 20, 'price': 299},
    {'region': 'east',  'product': 'laptop', 'units': 5,  'price': 999},
    {'region': 'north', 'product': 'tablet', 'units': 12, 'price': 299},
]
movies = [
    {'title': 'Inception',      'genre': 'scifi',   'year': 2010, 'rating': 8.8, 'gross_m': 836},
    {'title': 'The Dark Knight','genre': 'action',  'year': 2008, 'rating': 9.0, 'gross_m': 1005},
    {'title': 'Interstellar',   'genre': 'scifi',   'year': 2014, 'rating': 8.6, 'gross_m': 773},
    {'title': 'Parasite',       'genre': 'thriller','year': 2019, 'rating': 8.6, 'gross_m': 258},
    {'title': 'Avengers',       'genre': 'action',  'year': 2019, 'rating': 8.4, 'gross_m': 2798},
    {'title': 'Joker',          'genre': 'thriller','year': 2019, 'rating': 8.4, 'gross_m': 1079},
    {'title': 'Dune',           'genre': 'scifi',   'year': 2021, 'rating': 8.0, 'gross_m': 401},
]

Examples

Listing all available Query operations

Query.tools()

A convenience method that prints a formatted reference of all available Query operations, grouped by category:

Category	What it shows
📦 Data Sources	Methods to load data from iterables, CSV, SQLite, JSON, YAML, and Excel
🔄 Transformations	Row-level operations: `map`, `filter`, `limit`, `sort`, `groupby`
📊 Aggregations	Summary operations: `sum`, `count`, `max`, `min`
📤 Output	`collect` to get a list, `show` to pretty-print a table

Tip: Run Query.tools() any time you need a quick reminder of what’s available — no imports or docs needed.

Query.tools()


📦 Data Sources:

from_iterable(data)                 → Create Query from list/iterable
from_csv(path)                      → Load data from CSV file
from_sqlite(db, table)              → Load table from SQLite database
from_sqlite_query(db, query)        → Run custom SQL query on SQLite database
from_json(path, key=None, stream=False) → Load data from JSON file
from_yaml(path, key=None)           → Load data from YAML file
from_excel(path, sheet_name=None)   → Load data from Excel file

🔄 Transformations:

map(fn)                             → Transform each row
filter(fn)                          → Filter rows by condition
limit(n)                            → Limit number of rows
sort(key, reverse=False)            → Sort rows
groupby(key)                        → Group rows by a key

📊 Aggregations:

sum(by=None)                        → Sum values
count()                             → Count rows
max(by=None)                        → Maximum value/row
min(by=None)                        → Minimum value/row

📤 Output:

collect(n=None)                     → Return results as list
show(n=5)                           → Pretty print results

Sorting plain values

Use .sort() to sort a list of plain numbers or strings in ascending order.

Query.from_iterable([3, 1, 4, 1, 5]).sort().collect()
[1, 1, 3, 4, 5]

Use .sort(reverse=True) for descending order:

Query.from_iterable([3, 1, 4, 1, 5]).sort(reverse=True).collect()
[5, 4, 3, 1, 1]

# plain numbers
Query.from_iterable([3, 1, 4, 1, 5]).sort().collect()

[1, 1, 3, 4, 5]

Chaining filters, sorts, and display

Filter rows by a condition, sort by a field, and preview results with .show().

Query.from_iterable(scores) \
    .filter(F('age') > 19) \
    .sort(key='score', reverse=True) \
    .show()

F('age') > 19 — keeps only rows where age is greater than 19
.sort(key='score', reverse=True) — sorts by score, highest first
.show() — pretty-prints the top 5 rows as a table

# chained
Query.from_iterable(scores)\
    .filter(F('age') > 19)\
    .sort(key='score', reverse=True)\
    .show()

name  | age | score
-------------------
Eve   | 22  | 91   
Alice | 30  | 85   
Carol | 25  | 60

Grouping and aggregating

Use .groupby(key) followed by .sum(field) to total a numeric field across groups.

Query.from_iterable(sales).groupby('region').sum('units').collect()
[('north', 37.0), ('south', 33.0), ('east', 25.0)]

.groupby('region') — groups rows by the 'region' field
.sum('units') — sums the 'units' field within each group
.collect() — returns results as a list of (group_key, total) tuples

Query.from_iterable(sales).groupby('region').sum('units').collect()

[('north', 37.0), ('south', 33.0), ('east', 25.0)]

Top N results from a filtered, sorted dataset

Filter rows by a field value, sort by another field, limit results, and display as a table.

Query.from_iterable(movies) \
    .filter(F('genre') == 'scifi') \
    .sort('rating', reverse=True) \
    .limit(3) \
    .show()

Step	What it does
`filter(F('genre') == 'scifi')`	Keeps only rows where `genre` is `'scifi'`
`.sort('rating', reverse=True)`	Sorts by `rating`, highest first
`.limit(3)`	Takes only the top 3 results
`.show()`	Pretty-prints results as a table

Output:

title        | genre | year | rating | gross_m
----------------------------------------------
Inception    | scifi | 2010 | 8.8    | 836
Interstellar | scifi | 2014 | 8.6    | 773
Dune         | scifi | 2021 | 8.0    | 401

Tip: You can swap 'scifi' for 'action' or 'thriller', change the sort field to 'gross_m', or adjust limit(3) to get different slices of your data.

Query.from_iterable(movies) \
    .filter(F('genre') == 'scifi') \
    .sort('rating', reverse=True) \
    .limit(3) \
    .show()

title        | genre | year | rating | gross_m
----------------------------------------------
Inception    | scifi | 2010 | 8.8    | 836    
Interstellar | scifi | 2014 | 8.6    | 773    
Dune         | scifi | 2021 | 8.0    | 401

Loading and previewing data from a SQLite database

Use Query.from_sqlite() to read directly from a SQLite table and preview the results.

Query.from_sqlite('shows.db', 'shows').show()

Step	What it does
`from_sqlite('shows.db', 'shows')`	Connects to `shows.db` and reads all rows from the `shows` table
`.show()`	Pretty-prints the first 5 rows as a table

Output:

id    | title                       | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa                 | 1981 | 227
63881 | Catweazle                   | 1970 | 26
...

Tip: You can chain further operations after from_sqlite(), for example:
Query.from_sqlite('shows.db', 'shows') \
    .filter(F('episodes').not_null()) \
    .filter(F('year') > 1975) \
    .sort('episodes', reverse=True) \
    .show()
For custom SQL, use Query.from_sqlite_query(db, query) instead.

shows_table = Query.from_sqlite('../data/shows.db', 'shows')
shows_table.show()

id    | title                       | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa                 | 1981 | 227     
63881 | Catweazle                   | 1970 | 26      
63962 | UFO                         | 1970 | 26      
65269 | Ace of Wands                | 1970 | 46      
65270 | The Adventures of Don Quick | 1970 | 6

shows_table \
    .filter(F('year').not_null()) \
    .filter(F('episodes').not_null()) \
    .show()

id    | title                       | year | episodes
-----------------------------------------------------
62614 | Zeg 'ns Aaa                 | 1981 | 227     
63881 | Catweazle                   | 1970 | 26      
63962 | UFO                         | 1970 | 26      
65269 | Ace of Wands                | 1970 | 46      
65270 | The Adventures of Don Quick | 1970 | 6

Finding the top N rows by a numeric field

Filter out nulls, sort descending, and limit to get the top results.

Query.from_sqlite('shows.db', 'shows') \
    .filter(F('episodes').not_null()) \
    .sort(key='episodes', reverse=True) \
    .limit(5) \
    .show()

Step	What it does
`from_sqlite('shows.db', 'shows')`	Reads all rows from the `shows` table
`.filter(F('episodes').not_null())`	Excludes rows where `episodes` is `None`/missing
`.sort(key='episodes', reverse=True)`	Sorts by episode count, highest first
`.limit(5)`	Keeps only the top 5 results
`.show()`	Pretty-prints as a table

Output:

id       | title                      | year | episodes
-------------------------------------------------------
12164062 | NRK Nyheter                | 2007 | 18593
69658    | The Young and the Restless | 1973 | 13297
...

Tip: Replace 'episodes' with any numeric field (e.g. 'year') and adjust limit(5) to get different top-N results.

# Top 5 shows by episodes
shows_table\
    .filter(F('episodes').not_null())\
    .sort(key='episodes', reverse=True)\
    .limit(5).show()

id       | title                      | year | episodes
-------------------------------------------------------
12164062 | NRK Nyheter                | 2007 | 18593   
69658    | The Young and the Restless | 1973 | 13297   
988827   | See the World by Train     | 1987 | 10674   
33062164 | WREG News 3 at 10PM        | 1975 | 10663   
175383   | Barátok közt               | 1998 | 10456

Counting records per group, sorted by frequency

Query.from_sqlite('shows.db', 'shows') \
    .filter(F('year').not_null()) \
    .groupby('year').count() \
    .sort(key=1, reverse=True) \
    .collect(7)

Step	What it does
`.filter(F('year').not_null())`	Drops rows with no year
`.groupby('year').count()`	Counts shows per year, yielding `(year, count)` tuples
`.sort(key=1, reverse=True)`	Sorts by count (index `1`), highest first
`.collect(7)`	Returns top 7 results as a list

Tip: key=1 refers to the second element of the tuple (the count). Use key=0 to sort alphabetically/chronologically by the group key instead.

# Count shows per year
shows_table\
    .filter(F('year').not_null())\
    .groupby('year').count().sort(key=1, reverse=True).collect(7)

[(2020, 12705),
 (2021, 12241),
 (2018, 11646),
 (2017, 11591),
 (2019, 11534),
 (2016, 11339),
 (2015, 11121)]

Top shows by episode count, grouped by year

Step	What it does
`.filter(F('episodes').not_null())`	Skips shows with no episode count
`.groupby('year').max(by='episodes')`	Finds the most-episodes show per year, yielding `(year, row)` tuples
`.sort(key=lambda x: x[1]['episodes'], reverse=True)`	Sorts groups by their max episode count, highest first
`.collect(4)`	Returns the top 4 results as a list

Tip: The lambda x: x[1]['episodes'] key drills into the nested row dict inside each (year, row) tuple. Use key=lambda x: x[0] to sort by year instead.

#Top shows by episode count, grouped by year
shows_table\
    .filter(F('episodes').not_null()) \
    .groupby('year').max(by='episodes') \
    .sort(key=lambda x: x[1]['episodes'], reverse=True) \
    .collect(5)

[(2007,
  {'id': 12164062, 'title': 'NRK Nyheter', 'year': 2007, 'episodes': 18593}),
 (1973,
  {'id': 69658,
   'title': 'The Young and the Restless',
   'year': 1973,
   'episodes': 13297}),
 (1987,
  {'id': 988827,
   'title': 'See the World by Train',
   'year': 1987,
   'episodes': 10674}),
 (1975,
  {'id': 33062164,
   'title': 'WREG News 3 at 10PM',
   'year': 1975,
   'episodes': 10663}),
 (1998,
  {'id': 175383, 'title': 'Barátok közt', 'year': 1998, 'episodes': 10456})]

#total episodes per year (top 5 years)
shows_table\
    .filter(F('episodes').not_null())\
    .groupby('year').sum(by='episodes').sort(key=1, reverse=True).limit(5).collect()

[(2012, 324658.0),
 (2011, 313426.0),
 (2017, 301113.0),
 (2013, 287416.0),
 (2014, 276226.0)]

Listing all tables in a SQLite database

Query.from_sqlite_query('shows.db',
    "SELECT name FROM sqlite_master WHERE type='table'"
).collect()

Step	What it does
`from_sqlite_query(db, sql)`	Runs a raw SQL query instead of reading a full table
`.collect()`	Returns all results as a list of dicts

Tip: Use from_sqlite_query() whenever you need custom SQL — joins, subqueries, or schema inspection like this example.

tables = Query.from_sqlite_query('../data/shows.db', "SELECT name FROM sqlite_master WHERE type='table'").collect()
tables

[{'name': 'genres'},
 {'name': 'people'},
 {'name': 'ratings'},
 {'name': 'shows'},
 {'name': 'stars'},
 {'name': 'writers'}]

ratings = Query.from_sqlite('../data/shows.db', 'ratings')
ratings.show()

show_id | rating | votes
------------------------
62614   | 6.7    | 392  
63881   | 7.9    | 1224 
63962   | 7.9    | 4518 
65269   | 7.5    | 141  
65270   | 7.5    | 33

Joining tables with custom SQL

joined = Query.from_sqlite_query('shows.db', """
    SELECT s.title, s.year, s.episodes, r.rating, r.votes
    FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()

Step	What it does
`from_sqlite_query(db, sql)`	Runs a SQL `JOIN` across `shows` and `ratings` tables
`ON s.id = r.show_id`	Matches each show to its rating by ID
`.show()`	Previews the first 5 rows of the merged result

Tip: Once joined, you can chain any Query operations — e.g. .filter(F('rating') > 8.0).sort('votes', reverse=True) to find highly-rated, well-known shows.

joined = Query.from_sqlite_query('../data/shows.db', """
    SELECT s.title, s.year, s.episodes, r.rating, r.votes
    FROM shows s JOIN ratings r ON s.id = r.show_id
""")
joined.show()

title                       | year | episodes | rating | votes
--------------------------------------------------------------
Zeg 'ns Aaa                 | 1981 | 227      | 6.7    | 392  
Catweazle                   | 1970 | 26       | 7.9    | 1224 
UFO                         | 1970 | 26       | 7.9    | 4518 
Ace of Wands                | 1970 | 46       | 7.5    | 141  
The Adventures of Don Quick | 1970 | 6        | 7.5    | 33

# Top 10 highest rated shows (with at least 1000 votes)
joined.filter(F('votes') > 1000).sort(key='rating', reverse=True).limit(10).show(10)

title                      | year | episodes | rating | votes  
---------------------------------------------------------------
Coke Studio Pakistan       | 2008 | 109      | 9.6    | 1373   
Amrutham                   | 2001 | 61       | 9.6    | 1029   
Choufli Hal                | 2005 | 135      | 9.6    | 3265   
Geografens testamente      | 2011 | 38       | 9.6    | 1040   
Friday Five Sharp          | 2015 | 74       | 9.6    | 4571   
Breaking Bad               | 2008 | 62       | 9.5    | 2413272
Call Me Kevin              | 2016 | 3        | 9.5    | 1949   
The Why Files              | 2020 | 210      | 9.5    | 1142   
Upanishad Ganga            | 2012 | 50       | 9.5    | 1265   
The Surrealists' Top Chart | 1984 | 33       | 9.4    | 1787

Most voted shows with episode data

joined \
    .filter(F('episodes').not_null()) \
    .sort(key='votes', reverse=True) \
    .limit(5) \
    .show()

Step	What it does
`.filter(F('episodes').not_null())`	Excludes shows with missing episode counts
`.sort(key='votes', reverse=True)`	Sorts by vote count, most voted first
`.limit(5)`	Keeps the top 5 results
`.show()`	Pretty-prints as a table

Tip: Combine with a rating filter — e.g. .filter(F('votes') > 500000 & F('rating') > 8.5) — to find both massively popular and critically acclaimed shows.

# Most voted shows
joined.filter(F('episodes').not_null()).sort(key='votes', reverse=True).limit(5).show()

title            | year | episodes | rating | votes  
-----------------------------------------------------
Game of Thrones  | 2011 | 74       | 9.2    | 2491529
Breaking Bad     | 2008 | 62       | 9.5    | 2413272
Stranger Things  | 2016 | 42       | 8.6    | 1493344
The Walking Dead | 2010 | 177      | 8.1    | 1165465
Friends          | 1994 | 235      | 8.9    | 1164587

for t in tables:
    print(f"\n=== {t['name']} ===")
    Query.from_sqlite('../data/shows.db', t['name']).show(3)


=== genres ===
show_id | genre    
-------------------
62614   | Comedy   
63881   | Adventure
63881   | Comedy   

=== people ===
id | name            | birth
----------------------------
1  | Fred Astaire    | 1899 
2  | Lauren Bacall   | 1924 
3  | Brigitte Bardot | 1934 

=== ratings ===
show_id | rating | votes
------------------------
62614   | 6.7    | 392  
63881   | 7.9    | 1224 
63962   | 7.9    | 4518 

=== shows ===
id    | title       | year | episodes
-------------------------------------
62614 | Zeg 'ns Aaa | 1981 | 227     
63881 | Catweazle   | 1970 | 26      
63962 | UFO         | 1970 | 26      

=== stars ===
show_id | person_id
-------------------
62614   | 853892   
62614   | 393936   
62614   | 393936   

=== writers ===
show_id | person_id
-------------------
62614   | 887216   
62614   | 688754   
63881   | 139441

Best rated show per genre

Query.from_sqlite_query('shows.db', sql_q)
    .filter(F('votes') > 1000)
    .groupby('genre').max(by='rating')
    .collect()

Think of this like sorting a pile of movies into genre buckets (Action, Comedy, Drama…), then picking the highest-rated show from each bucket. We also require at least 1,000 votes so obscure shows don’t sneak in with a perfect score from 5 friends.

# joining shows, ratings and genre on show_id
sql_q ="""
    SELECT s.title, s.year, g.genre, r.rating, r.votes
    FROM shows s 
    JOIN ratings r ON s.id = r.show_id
    JOIN genres g ON s.id = g.show_id
"""

# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
   .groupby('genre').max(by='rating').collect(3)

[('Adventure',
  {'title': 'Geografens testamente',
   'year': 2011,
   'genre': 'Adventure',
   'rating': 9.6,
   'votes': 1040}),
 ('Comedy',
  {'title': 'Amrutham',
   'year': 2001,
   'genre': 'Comedy',
   'rating': 9.6,
   'votes': 1029}),
 ('Family',
  {'title': "De'ah Da'iah",
   'year': 2008,
   'genre': 'Family',
   'rating': 9.4,
   'votes': 1208})]

Same query, but sorted alphabetically (Z → A)

...max(by='rating').sort(reverse=True).collect()

This is identical to the query above, with one addition: .sort(reverse=True) arranges the results alphabetically by genre name, from Z to A (Western → Action). Without it, genres appear in the order they were first encountered in the data.

# Best rated shows by genre
Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\
   .groupby('genre').max(by='rating').sort(reverse=True).collect(3) # short alphbatically in reverse order

[('Western',
  {'title': 'Yellowstone',
   'year': 2018,
   'genre': 'Western',
   'rating': 8.6,
   'votes': 291730}),
 ('War',
  {'title': 'The Palestinian Alienation',
   'year': 2004,
   'genre': 'War',
   'rating': 9.4,
   'votes': 1589}),
 ('Thriller',
  {'title': 'Breaking Bad',
   'year': 2008,
   'genre': 'Thriller',
   'rating': 9.5,
   'votes': 2413272})]

Most prolific actors using SQL aggregation

Query.from_sqlite_query('shows.db', """
    SELECT p.name, COUNT(*) as show_count
    FROM people p JOIN stars s ON p.id = s.person_id
    GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()

Step	What it does
`JOIN stars s ON p.id = s.person_id`	Links each person to the shows they starred in
`COUNT(*) as show_count`	Counts number of shows per person
`GROUP BY p.id`	Aggregates one row per person
`ORDER BY show_count DESC LIMIT 10`	Returns top 10 most-appearing actors
`.show()`	Pretty-prints the results

Tip: This query uses SQL-side aggregation for performance — ideal when working with large tables. You can swap stars for writers to find the most prolific writers instead.

# Most prolific actors
Query.from_sqlite_query('../data/shows.db', """
    SELECT p.name, COUNT(*) as show_count
    FROM people p JOIN stars s ON p.id = s.person_id
    GROUP BY p.id ORDER BY show_count DESC LIMIT 10
""").show()

name              | show_count
------------------------------
Frank Welker      | 449       
Dee Bradley Baker | 320       
Grey DeLisle      | 312       
Tom Kenny         | 300       
Jeff Bennett      | 292