# → \[1, 1, 3, 4, 5\] ## What is lazyq? **lazyq** is a lightweight, chainable query pipeline for Python. Instead of executing operations immediately, lazyq builds up a **map of instructions** and only runs them when you actually need the results. This makes it memory-efficient and great for working with large datasets! ## Usage ### Installation Install latest from the GitHub [repository](https://github.com/vikasAWA/lazyq): ``` sh $ pip install git+https://github.com/vikasAWA/lazyq.git ``` or from [pypi](https://pypi.org/project/lazyq/) ``` sh $ pip install lazyq ``` ### Documentation Documentation can be found hosted on this GitHub [repository](https://github.com/vikasAWA/lazyq)’s [pages](https://vikasAWA.github.io/lazyq/). Additionally you can find package manager specific guidelines on [conda](https://anaconda.org/vikasAWA/lazyq) and [pypi](https://pypi.org/project/lazyq/) respectively. ## How to use ## Usage lazyq lets you build queries step by step. Let’s explore with some country data! ## Datasets ``` python scores = [ {'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}, {'name': 'Dave', 'age': 19, 'score': 74}, {'name': 'Eve', 'age': 22, 'score': 91}, ] data = [ {'dept': 'eng', 'name': 'Alice'}, {'dept': 'hr', 'name': 'Bob'}, {'dept': 'eng', 'name': 'Carol'}, {'dept': 'hr', 'name': 'Dave'}, ] sales = [ {'region': 'north', 'product': 'laptop', 'units': 10, 'price': 999}, {'region': 'south', 'product': 'phone', 'units': 25, 'price': 499}, {'region': 'north', 'product': 'phone', 'units': 15, 'price': 499}, {'region': 'south', 'product': 'laptop', 'units': 8, 'price': 999}, {'region': 'east', 'product': 'tablet', 'units': 20, 'price': 299}, {'region': 'east', 'product': 'laptop', 'units': 5, 'price': 999}, {'region': 'north', 'product': 'tablet', 'units': 12, 'price': 299}, ] movies = [ {'title': 'Inception', 'genre': 'scifi', 'year': 2010, 'rating': 8.8, 'gross_m': 836}, {'title': 'The Dark Knight','genre': 'action', 'year': 2008, 'rating': 9.0, 'gross_m': 1005}, {'title': 'Interstellar', 'genre': 'scifi', 'year': 2014, 'rating': 8.6, 'gross_m': 773}, {'title': 'Parasite', 'genre': 'thriller','year': 2019, 'rating': 8.6, 'gross_m': 258}, {'title': 'Avengers', 'genre': 'action', 'year': 2019, 'rating': 8.4, 'gross_m': 2798}, {'title': 'Joker', 'genre': 'thriller','year': 2019, 'rating': 8.4, 'gross_m': 1079}, {'title': 'Dune', 'genre': 'scifi', 'year': 2021, 'rating': 8.0, 'gross_m': 401}, ] ``` ## Examples **Listing all available Query operations** ``` python Query.tools() ``` A convenience method that prints a formatted reference of all available [`Query`](https://vikasAWA.github.io/lazyq/core.html#query) operations, grouped by category:

Category	What it shows
📦 Data Sources	Methods to load data from iterables, CSV, SQLite, JSON, YAML, and Excel
🔄 Transformations	Row-level operations: `map`, `filter`, `limit`, `sort`, `groupby`
📊 Aggregations	Summary operations: `sum`, `count`, `max`, `min`
📤 Output	`collect` to get a list, `show` to pretty-print a table

> **Tip:** Run > [`Query.tools()`](https://vikasAWA.github.io/lazyq/core.html#query.tools) > any time you need a quick reminder of what’s available — no imports or > docs needed. ``` python Query.tools() ``` 📦 Data Sources: from_iterable(data) → Create Query from list/iterable from_csv(path) → Load data from CSV file from_sqlite(db, table) → Load table from SQLite database from_sqlite_query(db, query) → Run custom SQL query on SQLite database from_json(path, key=None, stream=False) → Load data from JSON file from_yaml(path, key=None) → Load data from YAML file from_excel(path, sheet_name=None) → Load data from Excel file 🔄 Transformations: map(fn) → Transform each row filter(fn) → Filter rows by condition limit(n) → Limit number of rows sort(key, reverse=False) → Sort rows groupby(key) → Group rows by a key 📊 Aggregations: sum(by=None) → Sum values count() → Count rows max(by=None) → Maximum value/row min(by=None) → Minimum value/row 📤 Output: collect(n=None) → Return results as list show(n=5) → Pretty print results ``` python # plain numbers Query.from_iterable([3, 1, 4, 1, 5]).sort().collect() ``` [1, 1, 3, 4, 5] **Chaining filters, sorts, and display** Filter rows by a condition, sort by a field, and preview results with `.show()`. ``` python Query.from_iterable(scores) \ .filter(F('age') > 19) \ .sort(key='score', reverse=True) \ .show() ``` - `F('age') > 19` — keeps only rows where `age` is greater than 19 - `.sort(key='score', reverse=True)` — sorts by `score`, highest first - `.show()` — pretty-prints the top 5 rows as a table ``` python # chained Query.from_iterable(scores)\ .filter(F('age') > 19)\ .sort(key='score', reverse=True)\ .show() ``` name | age | score ------------------- Eve | 22 | 91 Alice | 30 | 85 Carol | 25 | 60 **Grouping and aggregating** Use `.groupby(key)` followed by `.sum(field)` to total a numeric field across groups. ``` python Query.from_iterable(sales).groupby('region').sum('units').collect() # → [('north', 37.0), ('south', 33.0), ('east', 25.0)] ``` - `.groupby('region')` — groups rows by the `'region'` field - `.sum('units')` — sums the `'units'` field within each group - `.collect()` — returns results as a list of `(group_key, total)` tuples ``` python Query.from_iterable(sales).groupby('region').sum('units').collect() ``` [('north', 37.0), ('south', 33.0), ('east', 25.0)] **Top N results from a filtered, sorted dataset** Filter rows by a field value, sort by another field, limit results, and display as a table. ``` python Query.from_iterable(movies) \ .filter(F('genre') == 'scifi') \ .sort('rating', reverse=True) \ .limit(3) \ .show() ```

Step	What it does
`filter(F('genre') == 'scifi')`	Keeps only rows where `genre` is `'scifi'`
`.sort('rating', reverse=True)`	Sorts by `rating`, highest first
`.limit(3)`	Takes only the top 3 results
`.show()`	Pretty-prints results as a table

Step	What it does
`from_sqlite('shows.db', 'shows')`	Connects to `shows.db` and reads all rows from the `shows` table
`.show()`	Pretty-prints the first 5 rows as a table

**Output:** id | title | year | episodes ----------------------------------------------------- 62614 | Zeg 'ns Aaa | 1981 | 227 63881 | Catweazle | 1970 | 26 ... > **Tip:** You can chain further operations after `from_sqlite()`, for > example: > > ``` python > Query.from_sqlite('shows.db', 'shows') \ > .filter(F('episodes').not_null()) \ > .filter(F('year') > 1975) \ > .sort('episodes', reverse=True) \ > .show() > ``` > > For custom SQL, use `Query.from_sqlite_query(db, query)` instead. ------------------------------------------------------------------------ ``` python shows_table = Query.from_sqlite('../data/shows.db', 'shows') shows_table.show() ``` id | title | year | episodes ----------------------------------------------------- 62614 | Zeg 'ns Aaa | 1981 | 227 63881 | Catweazle | 1970 | 26 63962 | UFO | 1970 | 26 65269 | Ace of Wands | 1970 | 46 65270 | The Adventures of Don Quick | 1970 | 6 ``` python shows_table \ .filter(F('year').not_null()) \ .filter(F('episodes').not_null()) \ .show() ``` id | title | year | episodes ----------------------------------------------------- 62614 | Zeg 'ns Aaa | 1981 | 227 63881 | Catweazle | 1970 | 26 63962 | UFO | 1970 | 26 65269 | Ace of Wands | 1970 | 46 65270 | The Adventures of Don Quick | 1970 | 6 **Finding the top N rows by a numeric field** Filter out nulls, sort descending, and limit to get the top results. ``` python Query.from_sqlite('shows.db', 'shows') \ .filter(F('episodes').not_null()) \ .sort(key='episodes', reverse=True) \ .limit(5) \ .show() ```

Step	What it does
`from_sqlite('shows.db', 'shows')`	Reads all rows from the `shows` table
`.filter(F('episodes').not_null())`	Excludes rows where `episodes` is `None`/missing
`.sort(key='episodes', reverse=True)`	Sorts by episode count, highest first
`.limit(5)`	Keeps only the top 5 results
`.show()`	Pretty-prints as a table

**Output:** id | title | year | episodes ------------------------------------------------------- 12164062 | NRK Nyheter | 2007 | 18593 69658 | The Young and the Restless | 1973 | 13297 ... > **Tip:** Replace `'episodes'` with any numeric field (e.g. `'year'`) > and adjust `limit(5)` to get different top-N results. ------------------------------------------------------------------------ ``` python # Top 5 shows by episodes shows_table\ .filter(F('episodes').not_null())\ .sort(key='episodes', reverse=True)\ .limit(5).show() ``` id | title | year | episodes ------------------------------------------------------- 12164062 | NRK Nyheter | 2007 | 18593 69658 | The Young and the Restless | 1973 | 13297 988827 | See the World by Train | 1987 | 10674 33062164 | WREG News 3 at 10PM | 1975 | 10663 175383 | Barátok közt | 1998 | 10456 **Counting records per group, sorted by frequency** ``` python Query.from_sqlite('shows.db', 'shows') \ .filter(F('year').not_null()) \ .groupby('year').count() \ .sort(key=1, reverse=True) \ .collect(7) ```

Step	What it does
`.filter(F('year').not_null())`	Drops rows with no year
`.groupby('year').count()`	Counts shows per year, yielding `(year, count)` tuples
`.sort(key=1, reverse=True)`	Sorts by count (index `1`), highest first
`.collect(7)`	Returns top 7 results as a list

> **Tip:** `key=1` refers to the second element of the tuple (the > count). Use `key=0` to sort alphabetically/chronologically by the > group key instead. ``` python # Count shows per year shows_table\ .filter(F('year').not_null())\ .groupby('year').count().sort(key=1, reverse=True).collect(7) ``` [(2020, 12705), (2021, 12241), (2018, 11646), (2017, 11591), (2019, 11534), (2016, 11339), (2015, 11121)] ``` python print("=== Sort ===") Query.from_iterable(scores).sort(key='age').show() print("\n=== GroupedQuery count ===") print(Query.from_iterable(movies).groupby('genre').count().collect()) print("\n=== GroupedQuery max ===") print(Query.from_iterable(movies).groupby('genre').max(by='rating').collect()) print("\n=== Double iteration ===") q = Query.from_iterable(scores).limit(3) print(q.collect()) print(q.collect()) ``` === Sort === name | age | score ------------------- Bob | 17 | 92 Dave | 19 | 74 Eve | 22 | 91 Carol | 25 | 60 Alice | 30 | 85 === GroupedQuery count === [('scifi', 3), ('action', 2), ('thriller', 2)] === GroupedQuery max === [('scifi', {'title': 'Inception', 'genre': 'scifi', 'year': 2010, 'rating': 8.8, 'gross_m': 836}), ('action', {'title': 'The Dark Knight', 'genre': 'action', 'year': 2008, 'rating': 9.0, 'gross_m': 1005}), ('thriller', {'title': 'Parasite', 'genre': 'thriller', 'year': 2019, 'rating': 8.6, 'gross_m': 258})] === Double iteration === [{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}] [{'name': 'Alice', 'age': 30, 'score': 85}, {'name': 'Bob', 'age': 17, 'score': 92}, {'name': 'Carol', 'age': 25, 'score': 60}] **Top shows by episode count, grouped by year**

Step	What it does
`.filter(F('episodes').not_null())`	Skips shows with no episode count
`.groupby('year').max(by='episodes')`	Finds the most-episodes show per year, yielding `(year, row)` tuples
`.sort(key=lambda x: x[1]['episodes'], reverse=True)`	Sorts groups by their max episode count, highest first
`.collect(4)`	Returns the top 4 results as a list

> **Tip:** The `lambda x: x[1]['episodes']` key drills into the nested > row dict inside each `(year, row)` tuple. Use `key=lambda x: x[0]` to > sort by year instead. ``` python #Top shows by episode count, grouped by year shows_table\ .filter(F('episodes').not_null()) \ .groupby('year').max(by='episodes') \ .sort(key=lambda x: x[1]['episodes'], reverse=True) \ .collect(5) ``` [(2007, {'id': 12164062, 'title': 'NRK Nyheter', 'year': 2007, 'episodes': 18593}), (1973, {'id': 69658, 'title': 'The Young and the Restless', 'year': 1973, 'episodes': 13297}), (1987, {'id': 988827, 'title': 'See the World by Train', 'year': 1987, 'episodes': 10674}), (1975, {'id': 33062164, 'title': 'WREG News 3 at 10PM', 'year': 1975, 'episodes': 10663}), (1998, {'id': 175383, 'title': 'Barátok közt', 'year': 1998, 'episodes': 10456})] ``` python #total episodes per year (top 5 years) shows_table\ .filter(F('episodes').not_null())\ .groupby('year').sum(by='episodes').sort(key=1, reverse=True).limit(5).collect() ``` [(2012, 324658.0), (2011, 313426.0), (2017, 301113.0), (2013, 287416.0), (2014, 276226.0)] **Listing all tables in a SQLite database** ``` python Query.from_sqlite_query('shows.db', "SELECT name FROM sqlite_master WHERE type='table'" ).collect() ```

Step	What it does
`from_sqlite_query(db, sql)`	Runs a raw SQL query instead of reading a full table
`.collect()`	Returns all results as a list of dicts

> **Tip:** Use `from_sqlite_query()` whenever you need custom SQL — > joins, subqueries, or schema inspection like this example. ``` python tables = Query.from_sqlite_query('../data/shows.db', "SELECT name FROM sqlite_master WHERE type='table'").collect() tables ``` [{'name': 'genres'}, {'name': 'people'}, {'name': 'ratings'}, {'name': 'shows'}, {'name': 'stars'}, {'name': 'writers'}] ``` python ratings = Query.from_sqlite('../data/shows.db', 'ratings') ratings.show() ``` show_id | rating | votes ------------------------ 62614 | 6.7 | 392 63881 | 7.9 | 1224 63962 | 7.9 | 4518 65269 | 7.5 | 141 65270 | 7.5 | 33 **Joining tables with custom SQL** ``` python joined = Query.from_sqlite_query('shows.db', """ SELECT s.title, s.year, s.episodes, r.rating, r.votes FROM shows s JOIN ratings r ON s.id = r.show_id """) joined.show() ```

Step	What it does
`from_sqlite_query(db, sql)`	Runs a SQL `JOIN` across `shows` and `ratings` tables
`ON s.id = r.show_id`	Matches each show to its rating by ID
`.show()`	Previews the first 5 rows of the merged result

> **Tip:** Once joined, you can chain any Query operations — > e.g. `.filter(F('rating') > 8.0).sort('votes', reverse=True)` to find > highly-rated, well-known shows. ``` python joined = Query.from_sqlite_query('../data/shows.db', """ SELECT s.title, s.year, s.episodes, r.rating, r.votes FROM shows s JOIN ratings r ON s.id = r.show_id """) joined.show() ``` title | year | episodes | rating | votes -------------------------------------------------------------- Zeg 'ns Aaa | 1981 | 227 | 6.7 | 392 Catweazle | 1970 | 26 | 7.9 | 1224 UFO | 1970 | 26 | 7.9 | 4518 Ace of Wands | 1970 | 46 | 7.5 | 141 The Adventures of Don Quick | 1970 | 6 | 7.5 | 33 ``` python # Top 10 highest rated shows (with at least 1000 votes) joined.filter(F('votes') > 1000).sort(key='rating', reverse=True).limit(10).show(10) ``` title | year | episodes | rating | votes --------------------------------------------------------------- Coke Studio Pakistan | 2008 | 109 | 9.6 | 1373 Amrutham | 2001 | 61 | 9.6 | 1029 Choufli Hal | 2005 | 135 | 9.6 | 3265 Geografens testamente | 2011 | 38 | 9.6 | 1040 Friday Five Sharp | 2015 | 74 | 9.6 | 4571 Breaking Bad | 2008 | 62 | 9.5 | 2413272 Call Me Kevin | 2016 | 3 | 9.5 | 1949 The Why Files | 2020 | 210 | 9.5 | 1142 Upanishad Ganga | 2012 | 50 | 9.5 | 1265 The Surrealists' Top Chart | 1984 | 33 | 9.4 | 1787 **Most voted shows with episode data** ``` python joined \ .filter(F('episodes').not_null()) \ .sort(key='votes', reverse=True) \ .limit(5) \ .show() ```

Step	What it does
`.filter(F('episodes').not_null())`	Excludes shows with missing episode counts
`.sort(key='votes', reverse=True)`	Sorts by vote count, most voted first
`.limit(5)`	Keeps the top 5 results
`.show()`	Pretty-prints as a table

> **Tip:** Combine with a rating filter — > e.g. `.filter(F('votes') > 500000 & F('rating') > 8.5)` — to find both > massively popular *and* critically acclaimed shows. ``` python # Most voted shows joined.filter(F('episodes').not_null()).sort(key='votes', reverse=True).limit(5).show() ``` title | year | episodes | rating | votes ----------------------------------------------------- Game of Thrones | 2011 | 74 | 9.2 | 2491529 Breaking Bad | 2008 | 62 | 9.5 | 2413272 Stranger Things | 2016 | 42 | 8.6 | 1493344 The Walking Dead | 2010 | 177 | 8.1 | 1165465 Friends | 1994 | 235 | 8.9 | 1164587 ``` python for t in tables: print(f"\n=== {t['name']} ===") Query.from_sqlite('../data/shows.db', t['name']).show(3) ``` === genres === show_id | genre ------------------- 62614 | Comedy 63881 | Adventure 63881 | Comedy === people === id | name | birth ---------------------------- 1 | Fred Astaire | 1899 2 | Lauren Bacall | 1924 3 | Brigitte Bardot | 1934 === ratings === show_id | rating | votes ------------------------ 62614 | 6.7 | 392 63881 | 7.9 | 1224 63962 | 7.9 | 4518 === shows === id | title | year | episodes ------------------------------------- 62614 | Zeg 'ns Aaa | 1981 | 227 63881 | Catweazle | 1970 | 26 63962 | UFO | 1970 | 26 === stars === show_id | person_id ------------------- 62614 | 853892 62614 | 393936 62614 | 393936 === writers === show_id | person_id ------------------- 62614 | 887216 62614 | 688754 63881 | 139441 **Best rated show per genre** ``` python Query.from_sqlite_query('shows.db', sql_q) .filter(F('votes') > 1000) .groupby('genre').max(by='rating') .collect() ``` Think of this like sorting a pile of movies into genre buckets (Action, Comedy, Drama…), then picking the **highest-rated show** from each bucket. We also require at least **1,000 votes** so obscure shows don’t sneak in with a perfect score from 5 friends. ------------------------------------------------------------------------ ``` python # joining shows, ratings and genre on show_id sql_q =""" SELECT s.title, s.year, g.genre, r.rating, r.votes FROM shows s JOIN ratings r ON s.id = r.show_id JOIN genres g ON s.id = g.show_id """ ``` ``` python # Best rated shows by genre Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\ .groupby('genre').max(by='rating').collect(3) ``` [('Adventure', {'title': 'Geografens testamente', 'year': 2011, 'genre': 'Adventure', 'rating': 9.6, 'votes': 1040}), ('Comedy', {'title': 'Amrutham', 'year': 2001, 'genre': 'Comedy', 'rating': 9.6, 'votes': 1029}), ('Family', {'title': "De'ah Da'iah", 'year': 2008, 'genre': 'Family', 'rating': 9.4, 'votes': 1208})] **Same query, but sorted alphabetically (Z → A)** ``` python ...max(by='rating').sort(reverse=True).collect() ``` This is identical to the query above, with one addition: `.sort(reverse=True)` arranges the results **alphabetically by genre name, from Z to A** (Western → Action). Without it, genres appear in the order they were first encountered in the data. ``` python # Best rated shows by genre Query.from_sqlite_query('../data/shows.db', sql_q).filter(F('votes') > 1000)\ .groupby('genre').max(by='rating').sort(reverse=True).collect(3) # short alphbatically in reverse order ``` [('Western', {'title': 'Yellowstone', 'year': 2018, 'genre': 'Western', 'rating': 8.6, 'votes': 291730}), ('War', {'title': 'The Palestinian Alienation', 'year': 2004, 'genre': 'War', 'rating': 9.4, 'votes': 1589}), ('Thriller', {'title': 'Breaking Bad', 'year': 2008, 'genre': 'Thriller', 'rating': 9.5, 'votes': 2413272})] **Most prolific actors using SQL aggregation** ``` python Query.from_sqlite_query('shows.db', """ SELECT p.name, COUNT(*) as show_count FROM people p JOIN stars s ON p.id = s.person_id GROUP BY p.id ORDER BY show_count DESC LIMIT 10 """).show() ```

Step	What it does
`JOIN stars s ON p.id = s.person_id`	Links each person to the shows they starred in
`COUNT(*) as show_count`	Counts number of shows per person
`GROUP BY p.id`	Aggregates one row per person
`ORDER BY show_count DESC LIMIT 10`	Returns top 10 most-appearing actors
`.show()`	Pretty-prints the results

> **Tip:** This query uses SQL-side aggregation for performance — ideal > when working with large tables. You can swap `stars` for `writers` to > find the most prolific writers instead. ``` python # Most prolific actors Query.from_sqlite_query('../data/shows.db', """ SELECT p.name, COUNT(*) as show_count FROM people p JOIN stars s ON p.id = s.person_id GROUP BY p.id ORDER BY show_count DESC LIMIT 10 """).show() ``` name | show_count ------------------------------ Frank Welker | 449 Dee Bradley Baker | 320 Grey DeLisle | 312 Tom Kenny | 300 Jeff Bennett | 292