Mashgin: The Future of Computer Vision

twitter-picAbout a year ago I invested in and joined a startup called Mashgin. In this post I want to talk a little about what we’re working on.

Mashgin is building a self-checkout kiosk that uses cameras to recognize multiple items at once without needing barcodes.

The current version of the kiosk is designed for cafeterias, where customers can slide their tray up and everything on it is recognized instantly. Excluding payment, the process takes around 2 seconds. No more waiting for a single line held up by a price check!

But retail checkout is just a package around Mashgin’s core fundamental technology. We believe there is an opportunity to apply recent technical advancements to many fields. Advancements such as:

  • Smartphone dividends — cheap sensors and ubiquitous, miniaturized electronic components
  • Cheap parallel processing power including low-cost GPUs
  • An explosion in collaborative, open-source software tools
  • Machine learning methods, in particular convolutional neural networks (a byproduct of the 2 preceding trends)
  • Cheap cloud infrastructure

Chris Dixon talks more about some of these trends in his post What’s Next in Computing?

So how is Mashgin applying this technology?

Adaptive Visual Automation

IMG_0330
Face swap: billionaire edition

Computer Vision transforms images into usable data (descriptions) using software. If cameras are the “eyes” of a machine, computer vision would be the brain’s visual cortex–processing and making sense of what it sees.

When computers know what they’re looking at, it opens up a world of potential. You can see it in existing use cases from facial recognition in Facebook photos (…or face swap apps) to Google Image Search and OCR. Newer, much more sophisticated applications include driverless cars, autonomous drones, and augmented reality.

Gradient Descent
A visual example of using gradient descent (the reverse of hill climbing in a fitness landscape) as part of the learning process of a neural network

These recent applications tend to be more complex, and as a result use machine learning in addition to traditional image processing methods. Machine learning, and in particular deep learning through neural networks, has changed the game in many areas of computer science, and we are just beginning to see its potential. ML can simplify a large amount of data into a single algorithm. As the name implies, it can learn and adapt to new information over time with little or no “teaching” from engineers.

Both CV and ML can be applied to many fields, but one of the biggest immediate needs is in Automation. There are a surprising amount of simple (to humans) visual tasks ripe for automation. This includes industrial use cases in manufacturing and distribution, and consumer use cases in household robotics and relief of everyday bottlenecks.

I call the above combination adaptive visual automation: using machine learning to automate vision-based tasks. Although relatively new, this combination covers a large and quickly growing class of real-world problems. Autonomous cars (and especially trucks) are a good up-and-coming example that will have huge ramifications.

Mashgin’s future

Mashgin uses adaptive visual automation to improve the speed, accuracy and cost of applications in recognition, measurement, and counting in a closed environment. That was a bit of a mouthful, so here’s the short version: Mashgin wants to make visual automation intelligent.

There’s a broader category of AI vision companies whose purpose is giving computers the ability to understand what they see. Mashgin is a subset of this group, focusing on automating well defined real-world problems.

There are further subsets such as eliminating bottlenecks in everyday circumstances — speeding up checkout lines being one example. In many of the activities you do on a daily basis, intelligent automation has the ability to save a huge amount of time and money.

Retail checkout is a big market (even for just cafeterias) but it only scratches the surface of the value Mashgin will eventually be capable of. We have already established a foundation for applying recent advancements to these problems and it will only get better from here.

Atlastory: Mapping the history of the world

Certain ideas are “inevitable” over time. Paul Graham calls them “[squares] in the periodic table” — if they don’t exist now, they’ll be created shortly. It’s only a matter of when, not if.

I believe that Atlastory is one of those ideas. The following is a long post about a project I’ve been passionate about for some time now and am currently in the process of winding down.

The Idea

Atlastory is an open source project to create an interactive map that chronicles the history of life on earth. It’s a “Google Maps” for history. The ultimate goal is the ability to see what the world looked like 50, 200, 1000+ years ago. It was inspired by OpenStreetMap & Wikipedia: combining historic maps with cultural & statistical data.

Atlastory map in action

I started Atlastory at first because I’m a fan of both history and good data visualizations. I was surprised something like this didn’t already exist and thought that it would be an amazing educational tool.

Maps are one of the best ways to clearly show an enormous amount of information. Since everything in the past took place at a certain time and location, maps are an obvious choice to visualize that knowledge. Understanding history requires seeing changes and interactions over time, and a four-dimensional map allows this.

To envision information—and what bright and splendid visions can result—is to work at the intersection of image, word, number, art.” — Edward Tufte

Good design will be a key aspect of the final product. Good information design can communicate a huge amount of knowledge in a small window of time or space. Great information design has a high amount of density and complexity while remaining completely understandable.

The Vision (version ∞)

Atlastory’s purpose is to improve understanding of the past by organizing and visualizing historic knowledge.

My vision for Atlastory was that one day it would become a tool like Wikipedia that’s used regularly around the world. A journalist could use it to go back 20 years to see the geography and timeline of a major world event. A student could use it to go back 20,000 years to see the expansion of human culture across the globe. A climatologist could use it to visualize the historic overlap of population growth with changes in global climate patterns.

Wikipedia organizes information by creating a searchable network of interconnected articles that combine text and other multimedia. Atlastory can be the first medium that allows completely visual navigation, displaying information at a much higher density and level of interactivity.

1937-WORLD

Imagine students in a classroom learning about World War II. You’d be able to see the country borders of Europe as they existed in 1942. Drag the timeline, and see the borders change as the years go on. Turn on an overlay of population density or GDP per capita and see the flow of activity throughout the war. Zoom in and see the troop movements of a pivotal battle.

The visual interactivity would make it much more enticing for people, young and old. Almost game-like in terms of exploration and discovery.

Eventually, the timeline could go back far enough that you’re able to see continental drift and other pre-historic geographic or environmental changes.

Map content

Maps can be broken down into a few different types:

  • Physical — shows the physical landscape including mountains, rivers, lakes.
  • Political — sovereign, national and state boundaries, with cities of all sizes. The typical world map you see will be political with some physical features.
  • Road — shows roads of various sizes along with destinations and points of interest. Google Maps & other navigation apps fall into this category.
  • Statistical — shows statistics about human populations such as economic stats, population density, etc.
  • Scientific — thematic maps that can show climate, ecological regions, etc. (see the climate map below)
  • Events — shows how a specific event played out geographically, like WWII or Alexander the Great’s conquests.

Climate patterns

Any map type that has enough data to span long periods could eventually go into the Atlastory system. Event, thematic, statistical, and scientific maps could all seamlessly layer on top of the main “base map”.

Base map

The Atlastory base map should be an elegant combination between 3 map types: physical (basic landscape features), political (sovereign and administrative boundaries), and cultural (see below). Major roads and infrastructure would be added only after a worldwide “structure” of the base map was created.

Importantly, map creation should be top down, from global to local. The purpose of an Atlastory map is not navigation, it is understanding of history. Creating a global structure will also provide context and make it easier to interest other users/contributors.

Cultural cartography

Most world maps made today (of the present time or of the last few hundred years or so) are of the political variety. But what happens when you go back a few thousand years? What about areas of the world where, even now, aren’t necessarily defined by geopolitical boundaries?

The solution is mapping cultural regions. Culture, in this case, being human societies with common language, belief systems, and norms. “A cultural boundary (also cultural border) in ethnology is a geographical boundary between two identifiable ethnic or ethno-linguistic cultures.”

A cultural map would have different levels, just like political maps: from dominant cultural macroregions to local divisions between subcultures or classes within a society (blue collar vs. white collar, etc.).

Combining cultural cartography with typical map types allows for a much better understanding of both modern and ancient history. Culture plays a major role in world events & limiting the map to only defined borders paints an inaccurate view of history.

Cultural regions

(Notice any overlap between cultural regions and the climate regions in the map above it?)

The Tech

The technical infrastructure behind Atlastory has a few basic components:

  1. A database of nodes (latitude/longitude points) organized into shapes, layers, types, and time periods.
  2. An API that manages, imports and exports data from the database.
  3. crowdsourced map editor interface (like iD for OpenStreetMap, but designed specifically for top-down time-based editing).
  4. A map rendering service that turns raw map data from the database into vector tiles that can be styled for viewing.
  5. The map itself: a web interface to view and navigate the maps.

Most of the components would be built from existing open-source tools created by organizations like OpenStreetMap, MapBox, and CartoDB. There has been a lot of technical innovation in this field over the past few years which is one of the main reasons something like Atlastory is now possible to build. (Although given what I known about the requirements still very challenging.)

Read more about the technical requirements…

The current status and future of Atlastory

I’ve been working on this as a side project for more than 3 years now. Originally I imagined being able to quickly find a way to profit from the service. But as development dragged on and other commitments began taking up more of my time, I realized I’d never be able to finish it alone.

Earlier this year I joined Mashgin, a startup in the Bay Area, as a full-time “Generalist.” My spare time completely dried up and I decided everything needed to be completely open sourced and distributed to anyone interested in the project.

Due to personal time constraints, I can’t continue with it so I’m looking for others who are interested. This could mean taking over / adapting the codebase or using other means to pursue the idea. See below for more details on what’s currently done. Although many of the back-end components are functional, the infrastructure is in a rather unusable state right now.

Please contact me or leave a comment below if this strikes your curiosity or you know anyone else who would be interested. I’m happy to answer any questions.

Resources

The dawn of immersive storytelling

From a previous #tweetstorm:

Immersive storytelling will be a big industry in the near future: movies viewed on Oculus Riftdome-like cinemas, or interactive games. We have co’s like Jaunt, Condition One & (consumer) making 360 cameras that will be used for filming.

A new visual “grammar” will have to be discovered by filmmakers through trial and error (i.e. no fast cuts, super close-ups, etc.). Parts of the legacy film industry will rebel at first, as they have over the last 100 years since storytelling evolved from live performances to filmed, pre-recorded stories.

Just like audiences were frightened at the sight of a train barreling towards them in early theaters, there will be a learning curve for immersive experiences. Early players of demo games for the Oculus Rift have been scared to the point of ripping their headsets off. Dome cinemas could be the social alternative to VR headsets. (If you ever been on Disney’s Soarin’ Over California ride that’s an example.)

Technology-wise, I feel a complete 360 field-of-view (FOV) like this Jaunt setup won’t be the way to go. There has to be some direction to the audience’s attention. A complete FOV is too immersive and incompatible with users’ prior experiences. Maybe at some point down the road. Something like a 180-220 degree FOV + 180 up and down to allow some freedom of motion (immersion) but still directed view with surround sound.

There is lots of experimentation ahead in the near future in both technology and storytelling grammar. I look forward to both observing and participating.

Book Notes: Innovation and Entrepreneurship

As with my other book notes, some passages are direct quotes and others are my own paraphrasing/summaries. Any footnotes or [brackets] are my personal comments.

Innovation & Entrepreneurship (1985), by Peter Drucker

Innovation and Entrepreneurship“The entrepreneur,” said the French economist J. B. Say around 1800, “shifts economic resources out of an area of lower and into an area of higher productivity and greater yield.”

All new small businesses have many factors in common. But to be entrepreneurial, an enterprise has to have special characteristics over and above being new and small. Indeed, entrepreneurs are a minority among new businesses. They create something new, something different; they change or transmute values. An enterprise also does not need to be small and new to be an entrepreneur. Indeed, entrepreneurship is being practiced by large and often old enterprises.

The entrepreneur upsets and disorganizes. As Joseph Schumpeter formulated it, his task is “creative destruction.” They see change as the norm and as healthy. Usually, they do not bring about the change themselves. But—and this defines entrepreneurship—the entrepreneur always searches for change, responds to it, and exploits it as an opportunity.

When shifting resources to a more productive area, there is a risk the entrepreneur may not succeed. But if they are even moderately successful, the returns should be more than adequate to offset whatever risk there might be. One should thus expect entrepreneurship to be considerably less risky than optimization. Indeed, nothing could be as risky as optimizing resources in areas where the proper and profitable course is innovation, that is, where the opportunities for innovation already exist. Theoretically, entrepreneurship should be the least risky rather than the most risky course. [There are “hidden” risks of not being an entrepreneur.]

“Innovation,” then, is an economic or social rather than a technical term. It can be defined the way Say defined it, as changing the yield of resources. Or, as modern economists would tend to do, it can be defined in demand terms rather than in supply terms: changing the value and satisfaction obtained from resources by the consumer. Continue reading “Book Notes: Innovation and Entrepreneurship”

Mental Model: Fitness Landscapes

UPDATE (September 2020): I wrote a more in-depth, detailed explanation of fitness landscapes and how they can be applied across disciplines. The original (shorter) version is still below but I’d recommend the latest one for a better understanding of the model.


Fitness Landscapes are used to visualize the relationship between genetic makeup (genotype) and evolutionary fitness (the ability to survive and reproduce). A fitness landscape is a vast landscape divided into a grid of billions of squares. Each square represents a genotype—some squares represent birds; some fish; some humans; with the majority being all the variations of genetic possibility that couldn’t survive in reality. Each square is very similar to its neighbors: two of the same species with a small variation, or two different but related species. The closer the squares, the more similar the genotype, and the further the squares, the more different. The fitness of each genotype is represented by its height on the landscape. Valleys represent low fitness, mountain peaks high fitness.

Fitness Landscape

Over time, species tend to move up the landscape to the nearest peak (A), where all future paths of variation lead downward. The peak that a genotype “settles” on is most likely to be a local optimum, which is not necessarily the highest peak in the landscape (a global optimum). This is because selection pushes fitness towards nearby peaks (what is called a basis of attraction), but lacks the foresight to select the highest peak.

To get to a higher peak, a species may have to reduce its fitness in the near term (C) as it slowly traverses across a valley in order to improve fitness in the long term. In order to make this shift, there has to be sufficient instability or challenge; otherwise, an organism will not opt to leave the intermediate peak and suffer the unknown prospects of the valley. If the valley is too low or the higher peak too far away, it may be unreachable as the low fitness hurdle can’t be overcome. (An example is the lack of wheeled animals, which although beneficial is inaccessible due to the valley of low fitness genotypes around it.)

Evolution usually moves in small steps, but occasionally it takes wild leaps—a single mutation might give a creature an extra pair of legs or another radically different feature. Most of the time these leaps result in much lower fitness (B), and therefore don’t last. But other times it allows the genotype to jump to a higher peak without the slow process of going down before going up.

Every landscape has different terrain that can be on a scale from flat to rugged. A rugged or coarse landscape has many local peaks and deep valleys, while a flat landscape has only very small hills (all genotypes have about the same success rates).

Landscapes don’t remain static—they shift over time due to either environmental changes or adjustments as organisms move across it. The movement can vary from being stable (relatively flat and slow to change) to roiling (likely rugged and changing quickly). Given the likelihood of ever-shifting landscapes, the evolutionary mix of small steps and occasional wild leaps is the best possible way to adapt to the environment.

Berkshire Hathaway Letters to Shareholders

Berkshire Letters CoverI’m excited to announce the release of a book I’ve been working on for about 6 months now, and first started in 2010.

It’s a compilation of every letter Warren Buffett wrote to the shareholders of Berkshire Hathaway. I first created it a few years ago for myself and friends. Last year I got Buffett’s endorsement — plus a few non-public letters — to publish the book for the benefit of fans and shareholders of Berkshire.

Here is the official page with all the details. There you can find a more detailed description, plus some sample pages and a chart detailing the performance of Berkshire’s insurance operations. (For any programmers out there, the chart was created with D3. You can check out the development version on GitHub.)

Features of the book:

  • Berkshire Hathaway annual shareholder letters from 1965 to 2012 (706 pages), including the 11 earliest letters not available on Berkshire’s website
  • Tabulated letter years so you can easily flip to the desired letter
  • Topics index
  • Company index
  • Person index
  • Charts of:
    • The growth in Berkshire’s book value and market price relative to benchmarks
    • Insurance float and performance
    • The operating businesses of Berkshire

The entire book is paginated, and has easy-to-flip-to labels for each letter’s year.

It is available for pre-order now. The first batch will be sold at the Berkshire Hathaway Annual Meeting on May 4 in the convention center. The rest of the copies will be available on Amazon on May 7.

Future projects

  • The obvious next step is to publish a digital version, easily readable on iPads or potentially Kindles. This is normally an easy transfer, but that’s not the case with this book due to the many tables that have to be converted. So no timeline on this but it will be forthcoming.
  • A book of letters to the partners of Buffett Partnership, Ltd., Buffett’s hedge fund he ran from 1957 to 1970. This will be a similar format to the Berkshire book, with indexes, page numbers, etc.

Steve Jobs on learning to code

From Robert X. Cringley’s “Steve Jobs: The Lost Interview”:

When we were designing our blue box, we wrote a lot of custom programs to help us design it, you know, and to do a lot of the dog work for us in terms of calculating master frequencies with subdivisors to get other frequencies and things like that. We used the computer quite a bit to calculate, you know, to calculate how much error we would get in the frequencies and how much could be tolerated.

So we used them in our work, but much more importantly, it had nothing to do with using them for anything practical. It had to do with using them to be a mirror of your thought process; to actually learn how to think.

I think the greatest value of learning how to—I think everybody in this country should learn how to program a computer—should learn a computer language, because it teaches you how to think. It’s like going to law school. I don’t think anybody should be a lawyer, but I think going to law school would actually be useful, because it teaches you how to think in a certain way, in the same way that computer programming teaches you in a slightly different way how to think. And so I view computer science as a liberal art.

Google Glass and the Segway Paradox

Google Glass

The customer rearely buys what the company thinks it is selling him. — Peter Drucker

Google Glass was finally announced to the public yesterday.

Glass is a solution looking for problems. It’s too hard to say what jobs-to-be-done Glass will be hired to do at this stage, or how widely used it will be. We’ll only know after it’s released.

The lean startup way of thinking heavily emphasizes the reverse sequence: find a problem (job), think of a way(s) to solve that problem, test your hypothesis using a minimum viable product, repeat. This method should work for most startups. It worked well for companies like Microsoft (Problem: I need an Operating System to put on the computers I sell so people can use them. Solution: Build/Buy/Copy Basic/DOS/Windows).

But there are some innovations where the solution=>problem sequence is necessary — anything that requires a lot of R&D and isn’t easily demo’ed on a large scale. Google Glass, Tesla cars, Segway, iPad, Lytro, etc. These are physical, more capital intensive examples, but the same still holds for some smaller software projects. Sometimes you just need to build the full version to see what it’s best used for.

One of the problems of this method is what I call the “Segway Paradox“: a new technology with huge initial interest and possibilities turns out to only be used in a few niche cases.

This may happen for a number of reasons (see Paul Graham’s The Trouble with the Segway). I think Google Glass may fall prey to this problem.

There are a few use cases I can think of that may make Glass worth the cost:

  1. Hands-free sports — biking, skiing, football, climbing
  2. Search & rescue, emergency — alerting the user to visual/audio anomalies
  3. Jobs that require detailed visual instructions (“advanced checklists”)

But it seems from the videos that Google is focusing more on everyday consumer uses, competing more with smartphones.

Mistakes = information

Mistakes can help us learn

Re-posted from the Atlastory blog.

In Nassim Taleb’s new book “Antifragile,” there’s an interesting segment about how an entire system can be antifragile (benefiting from variability / disorder / stressors) precisely because its individual parts remain fragile (harmed by variability). A few examples:

The engineer and historian of engineering Henry Petroski presents a very elegant point. Had the Titanic not had that famous accident, as fatal as it was, we would have kept building larger and larger ocean liners and the next disaster would have been even more tragic. So the people who perished were sacrificed for the greater good; they unarguable save more lives than were lost. . . . Every plane crash brings us closer to safety, improves the system, and makes the next flight safer.

Thankfully the errors we encounter while developing Atlastory don’t involve anyone dying. But the same principle applies — every bug, problem, server crash, chokepoint, or design flaw we encounter leads to a better system. We want to run into problems, because that means we know about them and can now fix them — eventually making the user experience better as a result.

“Some businesses love their own mistakes,” Taleb continues. “Reinsurance companies, who focus on insuring catastrophic risks . . . manage to do well after a calamity . . . All they need is to keep their mistakes small enough so they can survive them.”

The more you benefit from low-downside mistakes, the more “antifragile” your business is. I see this as a function of both the industry you’re in and the internal culture of the company.

If everyday work and life is viewed as a science experiment (the circle of observe > guess > test > interpret), then any screw-ups or failures are a good thing in the end. You know something’s wrong, and you can work on fixing it. Taleb again: “…every attempt becomes more valuable, more like an expense than an error. And of course you make discoveries along the way.”

Continual improvement is everyday life in software development, but it is only just catching on for personal development.

 

How to separate luck and skill

These are some of my notes from the book “The Success Equation” by Michael Mauboussin. This book was spotted on Warren Buffett’s desk in this tour of his office. There’s lots more interesting stuff in the book, but these notes in particular answer the question “How do you separate luck and skill?” We’ll start off with some definitions:

Luck is a chance occurrence that affects a person or a group (e.g., a sports team or a company). Luck can be good or bad. Furthermore, if it is reasonable to assume that another outcome was possible, then a certain amount of luck is involved. In this sense, luck is out of one’s control and unpredictable. Randomness and luck are related, but there is a useful distinction between the two. You can think of randomness as operating at the level of a system and luck operating at the level of the individual. Luck is a residual: it’s what is left over after you’ve subtracted skill from an outcome.

The definition of skill depends on how much luck there is in the activity. In activities allowing little luck, you acquire skill through practice of physical or cognitive tasks. In activities incorporating a large dose of luck, skill is best defined as a process of making decisions. Here, a good process will have a good outcome but only over time. Patience, persistence, and resilience are all elements of skill.

Separating luck and skill

Luck-Skill Continuum
At the heart of making this distinction lays the issue of feedback. On the skill side, feedback is clear and accurate, because there is a close relationship between cause and effect. Feedback on the luck side is often misleading because cause and effect are poorly correlated in the short run.

In most cases, characterizing what’s going on at the extremes is not too hard. As an example, you can’t predict the outcome of a specific fair coin toss or payoff from a slot machine. They are entirely dependent on chance. On the other hand, the fastest swimmer will almost always win the race. The outcome is determined by skill, with luck playing only a vanishingly small role.

Continue reading “How to separate luck and skill”