Fix Missing Records from Pagination in Google Sheets: NextPageToken Troubleshooting Guide for Developers

Google sheets 1

Missing rows during pagination usually happen because the “next page” state (page number, offset, or nextPageToken) is not handled deterministically across retries, sorting changes, or partial writes—so your Google Sheets dataset ends up with gaps that look random but follow a pattern.

To resolve the issue, you need to identify where records actually disappear (source API, transformation layer, or the Sheets write step) and then rebuild pagination as a resumable loop that can survive timeouts, quotas, and reruns without skipping or duplicating data.

After pagination is fixed, the next goal is proving completeness: you should validate counts, track tokens, and enforce idempotency so the sheet becomes a reliable “materialized view” of your upstream data—not a best-effort export.

Introduce a new idea: below is a practical, developer-focused workflow that turns “google sheets pagination missing records” into a repeatable checklist and implementation pattern you can apply across any API-to-Sheets pipeline.

Fix Missing Records from Pagination in Google Sheets: NextPageToken Troubleshooting Guide for Developers

Table of Contents

Why do Google Sheets imports show missing records when you paginate results?

Google Sheets imports show missing records during pagination because the pipeline loses or misapplies page state (offset, cursor, or nextPageToken) between requests, causing skipped pages, overwritten ranges, or “partial runs” that never resume from the correct checkpoint.

To better understand the issue, you need to treat pagination as a state machine with strict rules about ordering, checkpoints, and write verification—otherwise “missing records” becomes an unavoidable symptom.

What does “pagination missing records” mean in a Sheets workflow?

In a Sheets workflow, “pagination missing records” means your final sheet contains fewer unique entities than the upstream dataset you intended to fetch, even though your process reported success or only minor warnings.

  • What you observe: gaps in IDs, missing dates, missing customers/leads/orders, or a sudden drop after a certain row.
  • What it usually implies: at least one page was never fetched, was fetched but never written, or was written into the wrong range.
  • Why it’s deceptive: pagination can fail silently when your code stops early, retries incorrectly, or updates the same range repeatedly.

For example, if you fetch 100 records per page and accidentally stop when you see an empty “items” array on a transient response, you’ll miss every record after that point—even though earlier pages look correct.

Where records disappear: source API, transport, or Sheets write step?

Records can disappear in three places, and the fastest fix comes from pinpointing which layer is leaking data before you touch the code.

  • Source API layer: the API returns inconsistent pages because sorting is unstable, filters shift over time, or you request the wrong field set for later pages.
  • Transport/transformation layer: records are dropped during mapping (null checks, schema mismatch, array flattening) or due to an error handler that “continues” without storing the page.
  • Google Sheets write layer: writes succeed partially, overwrite prior rows, or target the wrong A1 range, so the sheet looks incomplete even if fetch was complete.

A simple diagnostic: log the number of items fetched per page and the number of rows written per page. If fetch counts are correct but write counts are lower, the problem is downstream of the API. If fetch counts drop unexpectedly, it’s pagination/state.

Which limits in Google Sheets can look like pagination?

Several Sheets behaviors can mimic pagination bugs, especially when you’re doing high-frequency updates or large batch writes.

  • Quota/rate limiting: exceeding per-minute request limits can trigger 429 errors, which cause partial runs if you don’t retry correctly with backoff.
  • Request size and batch behavior: large updates may fail validation or be rejected, and some batch patterns fail the entire request if a single sub-request is invalid.
  • Range misalignment: writing page 2 to the same range as page 1 creates the illusion that page 2 never existed.

In other words, “missing pages” can be a pure write problem: you did fetch the data, but your sheet never reflected it due to quota, batching, or incorrect range math.

Why do Google Sheets imports show missing records when you paginate results?

Is pagination always the cause of missing records in Google Sheets?

No, pagination is not always the cause of missing records in Google Sheets, because missing rows can also come from write-range mistakes, mapping drops, quota timeouts, and retry logic that overwrites or truncates data—but pagination is still the top suspect when gaps follow page boundaries.

However, you can avoid guesswork by applying a quick triage that separates pagination-state failures from Sheets-write failures.

Yes/No decision checklist (fast triage)

Use this “Yes/No” checklist to determine whether you should fix pagination first or investigate the write layer first.

  • Yes: Do missing records start at a consistent page boundary (e.g., after every 100/500/1000 rows)?
  • Yes: Do logs show your loop ended early (no error) before nextPageToken became null/empty?
  • Yes: Do you see repeated tokens/offsets across retries?
  • No: Do you fetch all pages (counts match) but sheet row count is smaller?
  • No: Do you see “successful” writes but data appears overwritten or shifted?

If you answer “Yes” to the first three, you likely have a pagination-state issue. If you answer “No” to the last two, you likely have a write/mapping issue.

Three strong signs the bug is pagination logic

Pagination logic is the culprit when your page progression is not monotonic, not resumable, or not tied to a stable ordering.

  • Sign 1: You rely on “pageNumber++” without verifying the API’s sort order is stable across requests.
  • Sign 2: You treat a transient empty page as “end of dataset” instead of retrying and validating.
  • Sign 3: You don’t persist the last successful cursor/offset, so a timeout causes you to restart incorrectly and skip or duplicate pages.

When these signs exist, “missing records” is predictable: every interruption or reorder event creates a hole.

Three strong signs the bug is range/write logic

Write logic is the culprit when fetch counts look correct but the sheet doesn’t reflect them in a one-to-one way.

  • Sign 1: Your target range is constant (e.g., always A2:Z) instead of moving forward with the current row pointer.
  • Sign 2: Your pipeline retries a failed write by reusing the same range, producing google sheets duplicate records created or overwriting blocks unpredictably.
  • Sign 3: Your mapping drops rows (null filters, required fields) and you don’t log how many were dropped and why.

In this scenario, fixing pagination alone won’t help; you must fix range calculation, write batching, and mapping transparency.

Is pagination always the cause of missing records in Google Sheets?

What are the most common pagination patterns that create missing records?

There are 4 main pagination patterns that create missing records—offset/limit, cursor/nextPageToken, time-window, and hybrid pagination—based on how the next page is calculated and how stable the underlying sort/filter conditions remain during the run.

Next, you’ll see exactly how each pattern fails in real-world Sheets imports and what “failure signature” to look for in logs.

Page-number (offset/limit) pagination pitfalls

Offset/limit pagination fails when the dataset changes between requests or when ordering is not stable, because offsets shift and you end up skipping or repeating records.

  • Unstable ordering: if you sort by a non-unique field (like updated_at) without a tie-breaker, records can move between pages.
  • Concurrent inserts/deletes: new records inserted “before” your current offset shift every subsequent page.
  • Stop conditions: stopping when “pageSize returned < requested” can be wrong if the API intermittently returns fewer items.

A classic symptom: you always miss “some” records in the middle, and reruns produce different missing IDs. That’s offset drift.

Cursor (nextPageToken) pagination pitfalls

Cursor-based pagination fails when you don’t persist the cursor, you reuse an old cursor, or you mistakenly treat the cursor as pageNumber, causing loops or early termination.

  • Token not saved: a timeout forces a restart from the beginning, and your “dedupe” logic might skip the wrong subset.
  • Token overwritten: multi-threading or async workers can race and store the wrong token.
  • Token mismatch: using a token from one filter/sort query with a different query leads to missing slices.

Cursor pagination is safer than offset in dynamic datasets—but only if your pipeline treats the cursor as a checkpoint and never “guesses” the next one.

Time-window pagination pitfalls (created_at/updated_at)

Time-window pagination fails when events arrive late, timestamps are updated retroactively, or you use non-overlapping windows that create gaps at boundaries.

  • Late arrivals: a record created earlier can appear after your window has moved past it.
  • Clock skew: different services write timestamps with different precision, causing boundary misses.
  • Inclusive/exclusive boundaries: using > vs >= incorrectly creates gaps or duplicates at window edges.

This is common in “daily export to Sheets” jobs: everything looks fine until you audit and find missing records around midnight boundaries or around high-load delays.

Hybrid pagination pitfalls (sorting + filtering)

Hybrid pagination fails when you combine filters with sorting but don’t ensure the sort key is unique and monotonic under the filter.

  • Filter changes: records move in/out of the filtered set during the run, causing “phantom holes.”
  • Non-unique sort keys: ties cause ambiguous page transitions, especially with offset-based pagination.
  • Partial field fetching: missing fields can break your mapping, and you silently drop rows while the fetch still “succeeds.”

This is where careful google sheets troubleshooting matters most: you must log the query parameters, sort order, and checkpoint values per page.

What are the most common pagination patterns that create missing records?

How do you fix pagination so every record is fetched before writing to Google Sheets?

The most reliable fix is to implement a resumable pagination loop in 4 steps—fetch page, persist checkpoint, write atomically, and verify counts—so that retries and restarts can continue from the last confirmed page without skipping or duplicating rows.

Below are proven patterns you can adapt whether you use nextPageToken, offset, or time windows.

Step-by-step: robust nextPageToken loop

A robust nextPageToken loop uses a single source of truth for the current token and writes a checkpoint only after both fetch and write succeed.

  1. Initialize: token = null; totalFetched = 0; nextWriteRow = 2 (or your header+1).
  2. Fetch: request page with token; validate response schema; capture itemsCount and newToken.
  3. Write: write items to a calculated range that starts at nextWriteRow; use one write per page (or batch) to avoid partial writes.
  4. Commit checkpoint: store newToken, nextWriteRow, itemsCount, and a runId in a log sheet or external store.
  5. Stop: end only when newToken is truly absent AND you have validated the last page is final (not transient empty).
  • Key rule: never update the saved token before the page is written and verified.
  • Key rule: never reuse a token with different query params than the page that generated it.

This loop prevents the most common failure: you fetched page N, crashed before writing it, resumed at page N+1, and permanently skipped N.

Step-by-step: stable ordering for offset pagination

Offset pagination can work if—and only if—you enforce a stable, unique ordering and you treat offsets as checkpoints with strict validation.

  1. Sort by a unique key: prefer (created_at, id) or just id if possible; never sort by a non-unique field alone.
  2. Lock the filter: keep filter conditions constant during the run; avoid “updated within last X minutes” without overlap.
  3. Fetch with limit: offset = pageIndex * limit; verify returned IDs are strictly increasing (or at least consistent).
  4. Detect drift: if you see IDs out of expected order, stop and switch to cursor or time-window mode.

Offset pagination isn’t “wrong,” but it demands stronger controls. In fast-changing datasets (tickets, events, leads), cursor pagination is usually safer.

Step-by-step: time-window with overlap and de-duplication

Time-window pagination becomes reliable when you add overlap and dedupe by a stable unique key, so late arrivals are still captured on the next run.

  1. Choose window size: e.g., 15 minutes, 1 hour, or 1 day based on volume.
  2. Add overlap: if window is 1 hour, overlap by 5–10 minutes to catch late arrivals.
  3. Dedupe strategy: use a unique ID column in Sheets and prevent duplicates via upsert logic (not raw append-only).
  4. Checkpoint: store the last processed timestamp (and ideally last ID) after verifying the window results were written.

This design trades a small amount of repeated scanning for completeness, which is usually the right trade for reporting and analytics sheets.

Step-by-step: backoff + retry without duplicating rows

Retries fix transient failures, but they also create duplicates if you retry writes naively or if you repeat a page without idempotency.

  • Retry fetch safely: retries are safe if you don’t advance the cursor/offset until you get a valid response.
  • Retry write safely: retries are safe if you write to the same deterministic range for that page (not “append again”).
  • Idempotent rows: store a unique ID per row and treat the sheet as upsertable, not append-only, when retries are expected.

If you’re seeing google sheets duplicate records created, the root cause is often “append on retry” rather than a pagination bug.

How can you confirm Google Sheets wrote every record without duplicates?

You can confirm Google Sheets wrote every record without duplicates by combining 3 controls—an ingestion log, idempotent keys, and checksum validation—so each page has a verifiable “fetched vs written” audit trail instead of assumptions.

More specifically, verification should be built into the pipeline so you detect missing records immediately, not weeks later during a manual audit.

Build an ingestion log: counts, tokens, and timestamps

An ingestion log is the simplest way to turn missing records into a measurable incident rather than a vague suspicion.

  • Log per page: runId, pageIndex, request parameters, checkpoint (offset/token/window), fetchedCount, writtenCount, firstID, lastID, startTime, endTime.
  • Log per run: expectedTotal (if known), actualTotalFetched, actualTotalWritten, retries, failures, and resume checkpoint.

When the sheet is missing records, the log tells you whether the gap is a fetch gap (missing page) or a write gap (page fetched but not written).

Row-level idempotency: unique keys and upserts

Row-level idempotency means each record has a unique key (like order_id) and your write logic updates existing rows instead of blindly appending duplicates.

  • Unique key column: store an immutable ID in column A (or a dedicated key column).
  • Upsert rule: if ID exists, update that row; if not, append new row.
  • Why it matters: if you restart from a saved checkpoint but reprocess the last page, idempotency prevents duplication.

This is the best long-term defense against duplicates caused by retries, partial failures, or window overlap strategies.

Checksum validation: hash the payload and compare

Checksum validation compares the upstream data fingerprint to the sheet fingerprint so you can detect silent drops even when row counts “look right.”

  • Per page checksum: hash stable fields (IDs + key attributes) and store the hash in the ingestion log.
  • Per run checksum: hash the concatenated page hashes to get a run-level fingerprint.
  • Reconcile: if the sheet checksum differs, you can re-fetch only the missing page ranges instead of re-running the entire export.

Checksum validation is especially helpful when mapping logic can drop rows due to schema changes or null-handling.

How can you confirm Google Sheets wrote every record without duplicates?

Offset vs cursor pagination: which is safer for Google Sheets data pipelines?

Cursor pagination wins in consistency, offset pagination can be simpler for small static datasets, and time-window pagination is often optimal for incremental reporting—so the safest choice for Google Sheets pipelines is usually cursor-based (nextPageToken) when the source dataset changes during the run.

However, the best method depends on whether the pipeline needs perfect completeness, fast backfills, or easy “jump to page” debugging.

Comparison table: consistency, speed, complexity

This table contains a practical comparison of offset vs cursor pagination (and a time-window alternative) to help you choose a method that minimizes missing records in a Sheets pipeline.

Method Best for Main risk Missing-record likelihood (dynamic data) Operational complexity
Offset/Limit Small, static datasets; easy page jumps Offset drift when data changes High Low–Medium
Cursor / nextPageToken Large or changing datasets; stable traversal Bad checkpoint handling; token reuse Low Medium
Time-window + overlap Incremental updates; reporting snapshots Boundary gaps without overlap/dedupe Low–Medium Medium–High

When offset pagination is acceptable

Offset pagination is acceptable when the dataset is effectively static during the export and you can enforce a stable, unique ordering.

  • Good candidates: a historical archive, a closed accounting period, a nightly snapshot table that doesn’t change during the run.
  • Required safeguards: unique sort key, consistent filter, and validation that page boundaries don’t drift.

In these cases, offset pagination can be easier to debug because you can re-run “page 42” directly.

When cursor pagination is the best choice

Cursor pagination is the best choice when records can be inserted, updated, or re-ordered while your export is running.

  • Good candidates: CRM leads, support tickets, events streams, orders, subscriptions, and any system with frequent writes.
  • Reason: cursors represent a server-side traversal state, so you don’t depend on fragile offsets.

If your goal is completeness, cursor pagination plus persisted checkpoints is usually the most robust path.

Practical recommendation for Sheets-based reporting

For Sheets-based reporting, use cursor pagination for backfills and large syncs, and time-window pagination with overlap for ongoing incremental updates—then enforce idempotency to eliminate duplicates from retries and overlaps.

  • Backfill run: cursor/nextPageToken, page logs, deterministic ranges.
  • Incremental run: time-window + overlap, upsert by unique key.
  • Ongoing guardrails: ingestion log, checksums, and a “resume from checkpoint” mechanism.

This combo prevents both “missing records” and the secondary pain of messy re-runs.

Contextual Border: Up to this point, the focus has been on fixing pagination logic and verifying completeness. Next, the content expands into micro-level failure modes—quotas, timeouts, and payload errors—that frequently trigger partial runs and misleading “missing records” symptoms.

How do quotas, timeouts, and retries change pagination behavior in Google Sheets automations?

Quotas, timeouts, and retries change pagination behavior by interrupting runs mid-stream, forcing partial execution paths, and triggering replays—so if your pipeline doesn’t persist checkpoints and enforce idempotent writes, it will either skip pages (missing records) or replay pages (duplicates).

In addition, these operational constraints can look like “pagination bugs” even when pagination logic is correct, because your run never completed or never resumed safely.

How Sheets API quotas trigger partial runs and 429 errors

Sheets API usage limits can throttle high-frequency pipelines, which often shows up as missing records when the job stops and never resumes.

  • What happens: you hit a per-minute request limit, receive a 429 response, and your code exits or skips a page without retrying.
  • How it becomes missing records: the pipeline writes pages 1–12, fails on 13, and ends; the sheet looks “almost complete,” but pages 13+ never land.
  • Correct response: apply exponential backoff, retry the same page, and continue only after success—without advancing the cursor/offset.

According to a study by Stony Brook University from the Department of Computer Science, in 2016, researchers showed that a modified backoff protocol can achieve expected constant throughput and improved robustness under disruption, supporting the idea that well-designed backoff can maintain progress instead of collapsing under contention.

Why timeouts split pages and how to resume safely

Timeouts split pages when long-running exports exceed execution limits (cloud functions, Apps Script time limits, CI job timeouts), causing incomplete pagination loops.

  • Risk: if you restart from “page 1” without checkpointing, you either waste time or create duplicates.
  • Safer resume: persist your last confirmed checkpoint (token/offset/window + nextWriteRow) and resume from there.
  • Write safety: use deterministic ranges for each page so a retry writes to the same block, not a new append.

This is the difference between a fragile export and a production-grade pipeline: the latter assumes timeouts will happen and treats resumption as a core feature.

How to handle invalid JSON payloads and attachment uploads in adjacent steps

Adjacent errors can mimic pagination gaps because your job fails before later pages are processed, even though pagination itself is fine.

  • Schema failures: a single malformed record can cause a transformation crash that stops the run; log and quarantine bad records instead of halting the entire job.
  • Payload issues: if you’re seeing google sheets invalid json payload in a connector step, validate serialization and encoding before write.
  • Upload steps: if your automation includes file steps and you see google sheets attachments missing upload failed, treat that as a separate failure domain and prevent it from blocking pagination progress.

A resilient design isolates concerns: pagination fetch is one loop, data validation is another step, and attachments/uploads are decoupled so one failure doesn’t create “missing records” downstream.

Troubleshooting map: missing records vs delayed tasks vs permission issues

A troubleshooting map reduces time-to-fix by connecting symptoms to the most likely root cause without re-reading your entire codebase.

  • Missing IDs in the middle: offset drift, unstable sorting, token mishandling, or partial-run stop.
  • Last pages missing: job ended early due to timeout/quota; stop condition wrong; cursor not resumed.
  • Data overwritten: wrong range math; constant A1 range; retry overwrote prior pages.
  • Duplicates after rerun: append-on-retry; missing idempotency; overlap windows without upsert.

When you apply this map, you stop guessing: you pick the layer (fetch, transform, write, operations) that matches the symptom and fix the smallest possible root cause.

Leave a Reply

Your email address will not be published. Required fields are marked *