Bill's Writings

Launching smolcrawl - Simple Web Scraping for LLMs and Knowledge Bases Tuesday, Apr 29, 2025

I'm excited to introduce smolcrawl, a new, lightweight Python tool designed to simplify web scraping and knowledge base creation. SmolCrawl streamlines the process of extracting, organizing, and searching web content, making it ideal for developers, researchers, and anyone looking to build personal knowledge collections.

Repetitions and Practice Monday, Apr 21, 2025

The older I get, the more I see repetitions and practice as critical pillars. It's straightforward but challenging. The reason being is that you're buying into something compounding as opposed to something that's immediately rewarding.

Some thoughts on GTM Friday, Apr 18, 2025

These are some reflections I've had over the past couple of months on go to market for products.

Infinite Monkeys Wednesday, Apr 16, 2025

AI + Platform Gravity Tuesday, Apr 15, 2025

There's an open question that I am chewing on that I'd like to hear your perspective on.

2 steps back, one step forward Monday, Apr 14, 2025

This conversation, or topic rather, came about with a good friend.

Builder Tactics - 15 for 15 Friday, Mar 1, 2024

As product builders, we're constantly making sure we're building the right thing.

Hyperlint - AI to Help Write and Maintain Great Documentation Friday, Feb 9, 2024

Over the past couple of months, I've been working on a new project called Hyperlint.

do you buy groceries every week? Monday, Oct 23, 2023

I do.

Do not water it down Tuesday, Oct 17, 2023

Optionality and volatility Friday, Oct 13, 2023

One of the challenges with optionality is volatility.

Building Conviction Thursday, Oct 12, 2023

Where's your bad work? Wednesday, Oct 11, 2023

Show me your bad work.

Do you have the runway? Tuesday, Oct 10, 2023

At a previous startup there were times of stress and challenge. Bickering, indecision, lack of perspective. It happens at most startups at one point or another.

Mind Games on the Trail Monday, Oct 9, 2023

The mind games. Life is just mind games. Whether it's team dynamics. Whether it's just you vs you. It's all mind games.

Rendering Markdown in Nuxt 3 & Vue 3 Wednesday, Sep 27, 2023

When working on my Scrappy Startup project using Vue 3, I encountered a need to render markdown. This markdown could either be fetched from a database or written inline within my application. Markdown, with its ease of writing and readability, serves as an excellent format for managing text-based content, especially when you have a considerable amount of textual data to handle.

Sniplet.xyz - Deep Search Podcasts to Find Relevant Snippets Friday, Sep 22, 2023

Sniplet.xyz is a tool that allows you to search deep into podcasts for relevant snippets or podcasts that you might want to listen to.

The Scrappy Startup - The Reverse Product Template Tuesday, Sep 19, 2023

Sometimes, creating a Press Release / FAQ can be a bit heavyweight. I wrote this template to write punchier proposals that allow for more testing and iteration. The goal is to prove or disprove ideas and document my process for doing so.

Amazon's Press Release FAQ Template Sunday, Sep 17, 2023

The following is the template for Press Release - FAQs as popularized by Amazon. This template is here as a resource for others to use.

Chat With Your Data using LangChain Thursday, Aug 10, 2023

Note: See the accompanying GitHub repo for this blogpost here.

So You Want to Join a Startup by David Henke Friday, Jul 21, 2023

The following is a memo that David Henke wrote in 1998. It was a formative article for me and has helped me make serious decisions about my career and where I chose to work. I asked him if I could reproduce it, since I couldn't find it online, and he obliged. Here's what he had to say about it...

Realestatecopywriter.io - AI Powered Real Estate Copywriting Tuesday, Jul 18, 2023

TenantFlow AI - AI Powered Leasing Agent for Residential Property Managers Thursday, Jul 6, 2023

The Next Step in the Journey Wednesday, Jun 14, 2023

This post will be subject to change and evolution. It represents the starting point for me 'starting up'.

Angel Mistake - you're not the operator Monday, Jun 12, 2023

Snorkeling & Scuba Diving - An Analogy Relevant for Starting Up Thursday, Jun 1, 2023

Car Rental Companies and Branding - What sharing desks teaches us about product management Tuesday, Jul 23, 2019

You decide that you're going to make a trip, a business trip. You're going to visit some customers and you hop onto whatever search engine and reserve a car, maybe through National. You get to the destination airport, stroll off the aircraft, grab your bag and walk to the car rental counters only to realize that the Enterprise and National all share the same desk.

Applying SaaS Company Metrics to Product Adoption Friday, Jan 11, 2019

Recently, there's been a renewed focus on monitoring and understanding company (or product) growth, especially when it comes to SaaS products. Werner Vogels recently mentioned something quite similar in a blog post, "People often ask me if developing for the cloud is any different from developing on-premises software. It really is." I couldn't agree more, it's awesome for understanding products, how users are using them, and what you can do to improve them.

Thoughts on Shutting Down Projects and Looking to 2019 Monday, Dec 24, 2018

After having sparktutorials.net up for several years, it's time to shut it down. I haven't written for the site in years at this point and it's not doing me any good now that I have The Definitive Guide published.

Thoughts on 'The Black Swan' by Nassim Taleb Friday, Feb 16, 2018

This was my second time reading The Black Swan by Nassim Taleb although admittedly I think I was a bit young the first time to fully absorb the content. That is not to say that I didn't get the TL;DR of "hey sometimes stuff happens that you can't predict that's meaningful", but what I missed was a lot of the nuance in the actual application of the principles to my life.

Spark: The Definitive Guide published by O'Reilly! Thursday, Feb 8, 2018

As of February 6th, 2018, Spark: The Definitive Guide has gone to print. This was the most intensive project and process that I've ever undertaken in my life. It was filled with frustrations and anticipations, excitements and fears. I must extend thanks to those that encouraged me to lead the writing of the book, namely Ion Stoica, Patrick Wendell, Ali Ghodsi, and (somewhat obviously) Matei Zaharia. These folks were the ones that recommended that I take the lead on the book and I am forever grateful for them to grant me such an opportunity.

Getting Started with Apache Spark Sunday, Dec 6, 2015

Lately I've been playing around with Spark for data processing. It provides some really amazing features like MLLib and Spark SQL and there's no better way to learn something that to use it. I've attended a couple of meet ups about Spark and its related tools including the famous ampcamp put on by the developers of spark and, although I'm not an expert, I thought it would be good to consolidate my knowledge and teach others.

Introducing SparkTutorials.net Thursday, Sep 10, 2015

I've recently launched a website called SparkTutorials.net. Spark Tutorials aims to educate the general public about the utility of Spark as a tool for data science. I would encourage you to read more on the website and learn something new!

A Simple Link Shortener in Scala Wednesday, Jun 10, 2015

Recently I took it upon myself to dive into Scala. This post describes what my reaction was after writing a link shortener service using it. For those only interested in the code, check out my github.

A Simple Link Shortener in Clojure Monday, Jun 1, 2015

Recently I took it upon myself to dive into clojure. This post describes what my reaction was after writing a link shortener service using it. For those only interested in the code, check out my github.

Visualizing Crime in San Francisco during the 2014 World Series Wednesday, May 20, 2015

During the World Series, especially during the Giants win, there was a mass rioting and looting. For our data visualization class, a classmate, John Semerdjian, and I made an interactive visualization of the crime in the city during each game.

Plotting Your AWS Redshift Data with Plotly Friday, May 8, 2015

08 May 2015

Plotting Spark DataFrames with Plotly Monday, May 4, 2015

This was a post that I did for Plotly covering the basics of plotting Spark DataFrames with plotly.

Visualizing Flights Origins and Departures with d3.js Sunday, Apr 5, 2015

This was built for a class project in my Information Visualization class.

K-Means Clustering - Liquor & Assaults in San Francisco Tuesday, Mar 31, 2015

This notebook walks through an example of KMeans clustering crime data with alcohol license locations. This clustering is performed solely based on the Lat/Long locations of stores and crimes. The tools I use are

Interactive Salesforce Graphing with Plotly Monday, Mar 23, 2015

This was a post that I did for Plotly covering the basics of the tool with Salesforce.

Exploratory Data Analysis of Crime in San Francisco Monday, Mar 16, 2015

Infographic - Why Hasn’t Russia’s Economic Collapse Affected its Leadership? Sunday, Mar 1, 2015

Hackday - Data Science and Docker Working Together Sunday, Jan 18, 2015

This past weekend was at the wise.io data science hack day and had a great time. The team is clearly intelligent and I really enjoy working and learning in that kind of environment.

Python NLP - NLTK and scikit-learn Wednesday, Jan 14, 2015

This post is meant as a summary of many of the concepts that I learned in Marti Hearst's Natural Language Processing class at the UC Berkeley School of Information. I wanted to record the concepts and approaches that I had learned with quick overviews of the code you need to get it working. I figured that it could help some other people get a handle on the goals and code to get things done.

Data Challenge - Rebalancing Bike Terminals in SF Thursday, Jan 8, 2015

Leada has recently set out to email out new datasets every week with a couple of interesting questions. I thought that this week's challenge posed some interesting questions that provide great examples of ways to use Python's pandas library.

Basic Statistical NLP Part 2 - TF-IDF And Cosine Similarity Monday, Dec 22, 2014

This is a two part post, you can see part 1 here. Please read that post (if you haven't already) before continuing or just check out the code in this gist.

Basic Statistical NLP Part 1 - Jaccard Similarity and TF-IDF Sunday, Dec 21, 2014

This is a two part post, you can see part 2 here.

The Future of Privacy Friday, Dec 5, 2014

Contemporary notions of privacy are complex and it is common to hear commentators calling the current state of privacy, or lack thereof, unprecedented. I would challenge the notion of an unprecedented violations of privacy on the basis of historical relativity. In absolute terms there is little question that the world we live in challenges any notions of privacy that have ever existed. However in relative terms, from a certain level of privacy to another, the rise of newspapers and the telegraph are interesting to compare to the modern era. In this paper I will revisit several key cultural and legal landmarks that have guided us to our current construct of privacy and look at future privacy implications of technologies like Amazon Echo and services like Facebook.

Deploying PostgreSQL for the California Civic Data Coalition's Django Project Tuesday, Nov 25, 2014

First, I'd like to introduce the California Civic Data Coalition. They are self described as a loosely coupled team from the Los Angeles Times Data Desk, The Center for Investigative Reporting and Stanford's Computational Journalism Lab.

A Gentle Introduction to Static Site Generators Tuesday, Nov 11, 2014

This document will be a simple introduction to static site generators. We'll go over the basics of what they are, why you should use them, which one you should use and finally how to get started.

DataKindSF - Data Analysis for the Greater Good Wednesday, Oct 8, 2014

DataKindSF just got their start and the reception was incredible. There was a huge turn out of people wanting to contribute by using high impact skills for greater good. I found out about the program through their meetup.

EverDone - A Project for An Evernote Hackathon Wednesday, Sep 10, 2014

Wow, Hackathons are an experience. Firstly I was amazed by the turn out, realistically probably 40 teams all competed in a 12 hour hackathon for Evernote at the Computer Science Department at Berkeley. I found the atmosphere to be supportive and fiercely competitive at the same time. Hackathon's are a strange creation and I've struggled to come up with a parallel in history. But that's for another post.

User Experience Critique - Habits in Timeful Sunday, Aug 31, 2014

Several weeks ago I sent a review of a feature in an app I use called Timeful. This is my letter to that company where I tried to get a better understanding of their motivations for the user experience of part of their application.

SurpriseHaiku - Discovering Unusual Haikus On Twitter Wednesday, Aug 20, 2014

SurpriseHaiku was an experiment that parsed random twitter tweets to see if they followed the Haiku cadence of 5 / 7 / 5 (syllables). I ran this experiment during the 2014 Olympics to try and focus around Olympics related tweets. This was an application that I built to learn more about the twitter API and dip my toes in the world of Natural Language Parsing or NLP.