Bill's Writings
As product builders, we're constantly making sure we're building the right thing.
Over the past couple of months, I've been working on a new project called Hyperlint.
One of the challenges with optionality is volatility.
At a previous startup there were times of stress and challenge. Bickering, indecision, lack of perspective. It happens at most startups at one point or another.
The mind games. Life is just mind games. Whether it's team dynamics. Whether it's just you vs you. It's all mind games.
When working on my Scrappy Startup project using Vue 3, I encountered a need to render markdown. This markdown could either be fetched from a database or written inline within my application. Markdown, with its ease of writing and readability, serves as an excellent format for managing text-based content, especially when you have a considerable amount of textual data to handle.
Sniplet.xyz is a tool that allows you to search deep into podcasts for relevant snippets or podcasts that you might want to listen to.
Sometimes, creating a Press Release / FAQ can be a bit heavyweight. I wrote this template to write punchier proposals that allow for more testing and iteration. The goal is to prove or disprove ideas and document my process for doing so.
The following is the template for Press Release - FAQs as popularized by Amazon. This template is here as a resource for others to use.
Note: See the accompanying GitHub repo for this blogpost here.
The following is a memo that David Henke wrote in 1998. It was a formative article for me and has helped me make serious decisions about my career and where I chose to work. I asked him if I could reproduce it, since I couldn't find it online, and he obliged. Here's what he had to say about it...
This post will be subject to change and evolution. It represents the starting point for me 'starting up'.
You decide that you're going to make a trip, a business trip. You're going to visit some customers and you hop onto whatever search engine and reserve a car, maybe through National. You get to the destination airport, stroll off the aircraft, grab your bag and walk to the car rental counters only to realize that the Enterprise and National all share the same desk.
Recently, there's been a renewed focus on monitoring and understanding company (or product) growth, especially when it comes to SaaS products. Werner Vogels recently mentioned something quite similar in a blog post, "People often ask me if developing for the cloud is any different from developing on-premises software. It really is." I couldn't agree more, it's awesome for understanding products, how users are using them, and what you can do to improve them.
After having sparktutorials.net up for several years, it's time to shut it down. I haven't written for the site in years at this point and it's not doing me any good now that I have The Definitive Guide published.
This was my second time reading The Black Swan by Nassim Taleb although admittedly I think I was a bit young the first time to fully absorb the content. That is not to say that I didn't get the TL;DR of "hey sometimes stuff happens that you can't predict that's meaningful", but what I missed was a lot of the nuance in the actual application of the principles to my life.
As of February 6th, 2018, Spark: The Definitive Guide has gone to print. This was the most intensive project and process that I've ever undertaken in my life. It was filled with frustrations and anticipations, excitements and fears. I must extend thanks to those that encouraged me to lead the writing of the book, namely Ion Stoica, Patrick Wendell, Ali Ghodsi, and (somewhat obviously) Matei Zaharia. These folks were the ones that recommended that I take the lead on the book and I am forever grateful for them to grant me such an opportunity.
Lately I've been playing around with Spark for data processing. It provides some really amazing features like MLLib and Spark SQL and there's no better way to learn something that to use it. I've attended a couple of meet ups about Spark and its related tools including the famous ampcamp put on by the developers of spark and, although I'm not an expert, I thought it would be good to consolidate my knowledge and teach others.
I've recently launched a website called SparkTutorials.net. Spark Tutorials aims to educate the general public about the utility of Spark as a tool for data science. I would encourage you to read more on the website and learn something new!
Recently I took it upon myself to dive into Scala. This post describes what my reaction was after writing a link shortener service using it. For those only interested in the code, check out my github.
Recently I took it upon myself to dive into clojure. This post describes what my reaction was after writing a link shortener service using it. For those only interested in the code, check out my github.
During the World Series, especially during the Giants win, there was a mass rioting and looting. For our data visualization class, a classmate, John Semerdjian, and I made an interactive visualization of the crime in the city during each game.
This was a post that I did for Plotly covering the basics of plotting Spark DataFrames with plotly.
This was built for a class project in my Information Visualization class.
This notebook walks through an example of KMeans clustering crime data with alcohol license locations. This clustering is performed solely based on the Lat/Long locations of stores and crimes. The tools I use are
This was a post that I did for Plotly covering the basics of the tool with Salesforce.
This past weekend was at the wise.io data science hack day and had a great time. The team is clearly intelligent and I really enjoy working and learning in that kind of environment.
This post is meant as a summary of many of the concepts that I learned in Marti Hearst's Natural Language Processing class at the UC Berkeley School of Information. I wanted to record the concepts and approaches that I had learned with quick overviews of the code you need to get it working. I figured that it could help some other people get a handle on the goals and code to get things done.
Leada has recently set out to email out new datasets every week with a couple of interesting questions. I thought that this week's challenge posed some interesting questions that provide great examples of ways to use Python's pandas library.
This is a two part post, you can see part 1 here. Please read that post (if you haven't already) before continuing or just check out the code in this gist.
This is a two part post, you can see part 2 here.
Contemporary notions of privacy are complex and it is common to hear commentators calling the current state of privacy, or lack thereof, unprecedented. I would challenge the notion of an unprecedented violations of privacy on the basis of historical relativity. In absolute terms there is little question that the world we live in challenges any notions of privacy that have ever existed. However in relative terms, from a certain level of privacy to another, the rise of newspapers and the telegraph are interesting to compare to the modern era. In this paper I will revisit several key cultural and legal landmarks that have guided us to our current construct of privacy and look at future privacy implications of technologies like Amazon Echo and services like Facebook.
First, I'd like to introduce the California Civic Data Coalition. They are self described as a loosely coupled team from the Los Angeles Times Data Desk, The Center for Investigative Reporting and Stanford's Computational Journalism Lab.
This document will be a simple introduction to static site generators. We'll go over the basics of what they are, why you should use them, which one you should use and finally how to get started.
DataKindSF just got their start and the reception was incredible. There was a huge turn out of people wanting to contribute by using high impact skills for greater good. I found out about the program through their meetup.
Wow, Hackathons are an experience. Firstly I was amazed by the turn out, realistically probably 40 teams all competed in a 12 hour hackathon for Evernote at the Computer Science Department at Berkeley. I found the atmosphere to be supportive and fiercely competitive at the same time. Hackathon's are a strange creation and I've struggled to come up with a parallel in history. But that's for another post.
Several weeks ago I sent a review of a feature in an app I use called Timeful. This is my letter to that company where I tried to get a better understanding of their motivations for the user experience of part of their application.
SurpriseHaiku was an experiment that parsed random twitter tweets to see if they followed the Haiku cadence of 5 / 7 / 5 (syllables). I ran this experiment during the 2014 Olympics to try and focus around Olympics related tweets. This was an application that I built to learn more about the twitter API and dip my toes in the world of Natural Language Parsing or NLP.