Blog & Insights

Sharing my experiences building scalable systems, working with AI, and lessons learned from real-world engineering challenges.

Firebase Hosting Caching
Cloud / Web Mar 29, 2026 3 min read
Cutting Firebase Hosting Costs by ~90% with a Tiny Config Change
A small firebase.json tweak can slash bandwidth usage while still serving fresh deployments correctly. No infra changes. No refactor. Just smarter caching.

No infra changes. No refactor. Just smarter caching.

The problem

I recently revisited my Firebase Hosting setup and found that without proper caching, the browser keeps downloading the same JavaScript, CSS, JSON, and image files again and again. That drives up bandwidth usage, slows down repeat visits, and burns through Firebase's free tier much faster than expected.

  • Browsers repeatedly download the same JS, CSS, JSON, and image assets.
  • Bandwidth usage increases significantly.
  • Repeat visits become slower than they need to be.
  • Firebase's free hosting tier gets exhausted quickly.

I was trying to stay inside Firebase's 10 GB free tier while handling about 100k monthly users. Without caching, the math is rough: 100,000 requests × 575 KB ≈ 57.5 GB. That is about 47 GB over the free limit.

The fix

The change is simple but high impact:

  • index.html stays uncached with Cache-Control: no-cache.
  • Static assets get public, max-age=31536000, immutable.

What happens with this setup?

  • The browser always checks index.html, which is a tiny file.
  • Static assets are cached aggressively.
  • Repeat visits reuse cached files instead of re-downloading them.
  • New deployments still work correctly because the HTML points to the latest asset filenames.

Real impact

Before Caching
57.5 GB
100k requests downloading the full app bundle.
After Caching
5.8 GB
Most visits reuse cached assets instead of pulling them again.
Estimated Reduction
~90%
Comfortably back under Firebase's free hosting tier.

With proper caching in place, repeat visitors download 0 KB, returning visitors usually pay only the cost of a tiny HTML check, and only new visitors pull the full app. That drops total bandwidth from about 57.5 GB to roughly 5.8 GB.

Why this works

Modern build tools like React and Vite generate hashed asset filenames such as main.a1b2c3.js and main.z9y8x7.js. On every deploy, the browser fetches a fresh index.html, sees the new filenames, and downloads only the new files it actually needs.

That means your old files can be cached for a year without becoming dangerous. The cache is tied to the filename, not the conceptual asset. If the filename changes, the browser automatically requests the new one.

Configuration difference

Before caching headers
No explicit cache policy for the HTML shell or static assets, so the browser keeps paying the full bandwidth cost.
{
  "hosting": [
    {
      "target": "staging",
      "public": "public",
      "ignore": [
        "index.html",
        "firebase.json",
        "**/.*",
        "**/node_modules/**"
      ],
      "rewrites": [
        {
          "source": "**",
          "run": {
            "serviceId": "service_id",
            "region": "US-EAST1"
          }
        }
      ]
    },
    {
      "target": "prod",
      "public": "public",
      "ignore": [
        "index.html",
        "firebase.json",
        "**/.*",
        "**/node_modules/**"
      ],
      "rewrites": [
        {
          "source": "**",
          "run": {
            "serviceId": "service_id",
            "region": "US-EAST1"
          }
        }
      ]
    }
  ]
}
After adding cache headers
Keep index.html fresh, cache immutable assets hard, and let hashed filenames handle safe updates.
{
  "hosting": [
    {
      "target": "staging",
      "public": "build",
      "ignore": [
        "firebase.json",
        "**/.*",
        "**/node_modules/**"
      ],
      "rewrites": [
        {
          "source": "**",
          "destination": "/index.html"
        }
      ],
      "headers": [
        {
          "source": "index.html",
          "headers": [{ "key": "Cache-Control", "value": "no-cache" }]
        },
        {
          "source": "static/**",
          "headers": [{ "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }]
        },
        {
          "source": "**/*.@(js|css|json|webp|png|jpg|svg)",
          "headers": [{ "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }]
        }
      ]
    },
    {
      "target": "prod",
      "public": "build",
      "ignore": [
        "firebase.json",
        "**/.*",
        "**/node_modules/**"
      ],
      "rewrites": [
        {
          "source": "**",
          "destination": "/index.html"
        }
      ],
      "headers": [
        {
          "source": "index.html",
          "headers": [{ "key": "Cache-Control", "value": "no-cache" }]
        },
        {
          "source": "static/**",
          "headers": [{ "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }]
        },
        {
          "source": "**/*.@(js|css|json|webp|png|jpg|svg)",
          "headers": [{ "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }]
        }
      ]
    }
  ]
}

Why a one-year cache is safe

The browser may cache main.a1b2c3.js for a year, but when a new deploy ships and index.html points to main.z9y8x7.js, the browser requests the new file automatically. No stale code. No broken updates.

Bonus: atomic deploys

Firebase Hosting uses atomic deploys, so the new index.html and the new asset files go live together. That avoids broken references between the HTML shell and the built assets.

Key takeaway

Caching is not just a performance optimization. It directly affects cost, speed, and deployment safety. A few lines in firebase.json can save real money and improve UX at scale. If you are using Firebase Hosting without proper caching, you are probably overpaying.

Fine-Tune Your Own LLM
AI / ML Feb 11, 2026 2 min read
How to Train an Open-Source LLM with Your Own Dataset — A High-Level Guide
New to AI development and unsure where to start? This step-by-step guide walks you through preparing a custom dataset, fine-tuning Llama 3.2 on Hugging Face, and using the trained model in your own codebase.

Introduction

Are you new to AI development? Struggling with where to start? Or do you want to train a model on your own dataset and actually use it? In this post, I'll give you a high-level overview so you can get started right away.

Here's our goal: we want to build a question-answering system. Note — this is not RAG (Retrieval-Augmented Generation). Instead, we will fine-tune an open-source model on our own dataset, host it on a server, and access it from anywhere using credentials. Keep in mind that this is a high-level overview; I can't explain every detail in a single post, but I'm confident you'll walk away with a clear understanding of how to train and use a model.

Step 1 — Prepare Your Dataset

First, start with a PDF file. Extract all the text from the PDF and split it into chunks of roughly 1,000 words each — so every 1,000 words becomes one chunk.

Next, use any LLM (such as GPT or Gemini): send each chunk to the model and ask it to generate five possible questions based on that chunk. Now you have your questions from the LLM, and you have the chunk data as the corresponding answers.

Create a dataset in JSONL (JSON Lines) format. Here's an example of what the output looks like:

{"answer":"The Amazon rainforest is the largest tropical rainforest in the world. It covers much of northwestern Brazil.","question":"Which country contains most of the Amazon rainforest?"}
{"answer":"The Amazon rainforest is the largest tropical rainforest in the world. It covers much of northwestern Brazil.","question":"What type of forest is the Amazon rainforest?"}

Step 2 — Fine-Tune the Model on Hugging Face

Now it's time to fine-tune an open-source model. Follow these steps:

  1. Go to HuggingFace.co.
  2. Search for "Llama 3.2 3B Instruct" and open the model page.
  3. Log in (or sign up), then click Request Access.
  4. Wait approximately 10 minutes for Meta's approval.

After approval, start fine-tuning via AutoTrain:

  1. On the model page (top right), click Train → AutoTrain.
  2. Click Create New Project.

Configure the AutoTrain Space:

  • Space Name: e.g., test-model-llama-3.2
  • Description: e.g., Testing Llama 3.2
  • Space SDK: Docker
  • Docker Template: AutoTrain (leave as default)

Select Hardware (this is important):

  • Choose NVIDIA A10G Small (~$1/hour). Llama 3B needs several GB of RAM, and this GPU works reliably.

Set Important Parameters:

  • Pause on failure → Set to 0 (this enables detailed error logs).
  • Visibility → Private
  • License → Leave empty

Then click Create Space. Hugging Face will spin up the Docker container and prepare the training environment.

Configure Training Inside AutoTrain:

  1. Set the Project Name (e.g., test-llama).
  2. Select the Base Model: Llama 3.2 3B Instruct.
  3. Upload your training file — it must be a .jsonl file (the dataset you prepared earlier).
  4. Set Number of Epochs to 1.
  5. Leave all other hyperparameters (learning rate, batch size, optimizer, scheduler) as default — do not change them.

Click Start Training. Once training completes, the fine-tuned model will appear under your Hugging Face profile as a new model.

Step 3 — Use Your Model in Code

Now you can load and use your fine-tuned model directly in Python:

from huggingface_hub import login

login(token="YOUR_HF_TOKEN")

from transformers import AutoModelForCausalLM, AutoTokenizer

model_path = "YOUR_MODEL_NAME"

tokenizer = AutoTokenizer.from_pretrained(model_path)

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    device_map="auto",
    torch_dtype="auto"
).eval()

messages = [
    {"role": "user", "content": "Which country contains most of the Amazon rainforest?"}
]

input_ids = tokenizer.apply_chat_template(
    conversation=messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

output_ids = model.generate(
    input_ids.to("cuda"),
    max_new_tokens=500
)

response = tokenizer.decode(
    output_ids[0][input_ids.shape[1]:],
    skip_special_tokens=True
)

print(response)

Final Thoughts

Your model will now answer questions based on your training data. You might face a few difficulties along the way, but trust me — if you use any AI assistant (Gemini, ChatGPT, etc.), it will help you resolve errors and clear up any confusion. Happy coding!