Say you’re a university loan administrator. You have one loan and two students, who anonymized seem to you in every way identical: same GPA, same payment history, all that good stuff. You can’t decide. You ask a data-driven startup to determine which one’s the greater risk to default or pay late. You have no idea how they do it, but it comes back —
The answer’s clear: Student A!
Congratulations, you’ve just perpetuated historical racism.
You didn’t know it. The startup didn’t know it: evaluated both students and found Student A’s social networks are stable, and their secondary characteristics are all strongly correlated with future prosperity and loan repayment. Student B scores less well on social network and their high school, home zip code, interests, and demographic data are associated with significantly higher rates of late payments and loan defaults.
From their perspective, they’re telling you — accurately, most likely — which of those two is the better loan risk.
No one at the startup may even know what the factors were. They may have grabbed all the financial data they could get from different sources, tied it to all the demographic data they could find, and set machine learning on the problem, creating a black box that they show is significantly better than a loan officer or risk analyst at guessing who’s going to default. No one
Machine learning goes like this: you feed the machine a sample of, say, 10,000 records, like
You set the tools on it and it’ll find characteristics and combinations of characteristics that it associates with the outcomes, so that if you hand it a new record: Angela, a 22-year old from Boston, unmarried, doesn’t like Corgis, and your black box says “I’m 95% sure they’ll default.”
It’s the ultimate in finding correlation and assuming causation.
You see how good it is by giving it sets of people where you know the outcome and see what the box predicts.
You don’t even want to know what the characteristics are, because you might dismiss something that turns out to be important (“People who buy riding lawnmowers buy black drip coffee at premium shops? What?”).
Because machine learning is trained up on the past, it means that it’s looking at what people did while being discriminated against, operating at a disadvantage, and so on.
For instance, say you take ZIP Codes as an input to your model. Makes sense, right? That’s a perfectly valid piece of data. It’s a great predictor of future prosperity and wealth. And you can see that people in certain areas are fired more from their jobs, and have a much harder time finding new ones, and so default on payments more often. Is it okay using that as a factor?
Because America spent so long segregating housing, and because those effects continue forward, using ZIP means that given ZIP X I’m 80% certain you’re white. Or 90% if you’re in 98110.
We don’t even have to know, as someone using the model, that someone is black. I just see that people in that zip code predict defaults, or being good on your loan. Or I might not even know that my trained black box loves ZIP Codes.
And if you can use address information to break it down to census tract and census block, you’re even better at making predictions that are about race.
This is true of so many other characteristics. Can I mine your social network and connect you directly to someone who’s been to jail? That’s probably predictive of credit suitability. Oh — black people are ~9 times more likely to be incarcerated.
Are your parents still married? Were they ever married? That’s — oh.
Oh no! You’ve been transported back in time. You’re in London. It’s 1066. William, Duke of Normandy, has just now been crowned. You have a sackful of gold you can loan out. Pretty much everyone wants it, at wildly varying interest rates. Where do you place your bets?
William, right? As a savvy person, you’re vaguely aware that England has a lot of troubles ahead, but generally speaking, you’re betting on those who hold wealth and power to continue to do so.
What about say, 500 years later? Same place, 1566. Late-ish Tudor period. You’re putting your money on the Tudors, while probably being really careful to not actually reminding them that they’re Tudors.
Betting on the established power structure is always a safe bet. But this means you’re also perpetuating that unjust power structure.
Two people want to start a business. They’re equally skilled. One gets a loan at 10% interest, the other at 3%. Which is more likely to succeed?
Now is the bank even to blame for making that reasonable business decision? After all, some people are worse credit risks that others. Is the bank to disregard a higher profit margin by being realistic about the higher barriers that minorities and women face? Don’t they have a responsibility to their shareholders to look at all the factors?
That’s seductively reasonable. Too see this at scale, look at America’s shameful history of housing discrimination. Blacks were systematically locked out of mortgages and house financing, made to pay extremely high rates without building mortgages, never building equity. At the same time, their white counterparts could buy houses, pay them off, and pass that wealth to their kids. Repeat over generations. Today, about a third of the wealth cap in families, where white families have over $100,000 in assets and minorities almost nothing, comes from the difference in home ownership.
When we evaluate risk based on factors that give us race, or class, we handicap those who have been handicapped generation after generation. We take the crimes of the past and ensure they are enforced forever.
There are things we must do.
First, know that your data will reflect the results of the discriminatory, prejudiced world that produced them. As much as possible, remove or adjust factors that reflect a history of discrimination. Don’t train on prejudice.
Second, know that you must test the application of the model: if you apply your model, are you effectively discriminating against minorities and women? If so, discard the model.
Third, recognize that a neutral, prejudice-free model might seem to test worse against past data than it will in the future, as you do things like make capital cheaper to those who have suffered in the past. Be willing to try and bet on a rosier future.
Citations on wealth disparity:
I’ve taken on Product Management for a set of internal tools, and found myself lost in 700-some open tickets (including meta-tickets and sub-tickets and all that goodness). Product’s a relatively new discipline at the company, the tools team is saddled with technical debt and severely resource-constrained, and my early discussions with internal customers ran strong with discontent.
As a fan of Mellisa Perri generally and “Rethinking the Product Roadmap” in particular, I wanted to try seeing if a Problem Roadmap meeting would help.
I hoped a problem roadmap would give us all agreed-on, prioritized problems we could evangelize and pursue, going from being ticket-or-project focused (700 tickets!) to outcome-focused, and start reducing the iteration time from months to weeks and soon, days. Then I’d be able to start culling that backlog like crazy and lining up ideas and bugs against outcomes we were pursuing, and we’d all have clear success metrics we could look to.
I invited members of the development team and a cross-section of interested people in the support organization for two hours. We ended up with ~12 people.
To start, I presented the goals for the company that related to the discussion: where did we need to get to with customer satisfaction overall, and our goals specific to our customer support organization.
I introduced what we were trying to do in the meeting, along with an example problem and a metric that could trace it. On the giant whiteboards, I drew two columns for problem and metric to measure it.
Then I asked “What are the problems we’re facing getting to our goals?”
Early, our conversations were specific: “Bug X is hurting us” which in turn led to “Oh, we’re working on that” (which I was guilty of). We’d come up with metrics to measure those and move on. As we filled each whiteboard up, I’d move to the next board (or take pictures and erase all three)
We quickly moved to larger issues, and the discussions got into new, interesting problems I knew we weren’t already discussing. Which led to eager participants jumping to “how we could fix that.” This was challenging: when do you bring that back, and when do you let it run?
I’d explain (or re-iterate) once we’d defined problems and metrics, we’d vote and then pursue solutions. But some of the ideas were so good, it was hard to rein them in.
With more problems, we got better defining the metrics we’d use, and it led to a focus I hadn’t seen in other meetings trying to address this. In some cases, needing metrics meant reconsidering what we thought the problem meant, sometimes discovering there was more than one problem.
New, more specific descriptions often illuminated issues there’d long been angst but not clarity around, and the metrics provided a way for us to target them. For example, a problem that a tool didn’t work right resulted in us defining three issues: workflows, tool design, and then the technology, all with metrics. That clarity would have made this worth doing on its own.
Requiring metrics forced uncomfortable discoveries that we didn’t have useful measurement against our goals, which I’d also have held the meeting just to find out.
Towards the end, we’d gotten to amazing discussions I hadn’t yet seen. New problems just under the company and organizational goals, and in considering those, a problem around whether the organization was structured to pursue our larger goals.
I’ll offer two examples of the kind of problems we came up with early, and then later as conversation opened up
|Portland coffee is 10% worse than Seattle’s||Survey of employee satisfaction; dev team velocity; chats answered/hour|
|We can’t see if we’ve met our goals if we can’t measure them||Yes/no: are there metrics in place that measure x/y/z?|
Yup. 90 minutes from Bug A (Bug A, measure Error A) to sweeping, actionable metrics (Organizational issue B, employee satisfaction, workflow measurement, other good stuff).
Then the voting. I re-wrote the list from the photos for space, and then gave everyone five votes, multi-votes okay. Here’s what happened:
|Huge existential thing we’d never talked about||9|
|Large systemic issue with banky thing A||5|
|Large systemic issue with bank thing B||4|
… followed by a long tail of 2s and 1s.
We’d never talked about the top item before! Anywhere! It wasn’t on a roadmap! It wasn’t in any of the 700 tickets! Brand new! I’m using exclamation points!
Both the problems and metrics we came up with around the 2nd-and-3rd place priorities clarified huge problem clouds (dozens of tickets filed against something with different solutions or issues, all without metrics or an overall goal).
That’s gold. I’m so happy.
I’d recommend this approach to anyone doing Product (or Program) Management looking for a way to re-focus conversation on outcomes. I’ll report back on how evangelizing the findings goes. The discussions it inspires around the problems, and how to measure them, made it worthwhile.
I can see where it might be less valuable if you’re tightly bound to prescribed work… but also, I can see where it might help you break out from that.
Now, some random miscellany.
Logistical challenges running the meeting:
Questions I’m considering for next time:
“I love talking to new users, they’re so full of wonder.” — Simple person
After two days at Simple, some random thoughts:
They’re unbelievably dedicated to customer service. Part of why I signed on was that I knew this, but I keep finding it’s greater than I thought.
Their customer service people, who are here, in Portland, have no scripts.
They’re free to listen to and help people however they can.
This is so great.
Then, when a co-worker was walking me through some parts of the application, he’d show me something and then some tiny edge case where it was so clear that someone had sweated the details. I said “that is so cool” repeatedly as he demonstrated things.
They’ve managed this under nearly unbelievable constraints, being in the banking industry. And I spent five years at the start of my tech career in telecom. That the product is as surpassingly good as it is is amazing.
I filed my first two bugs against the site today, which should surprise no one who knows me. My first bug was yesterday, against an internal tool.
(Written because all the Google Search results I could find were spam)
Desktop only, b/c I’m specifically looking at things you can use while not internet-connected.
How I evaluated: I mapped out my thinking on some simple project ideas, which meant a lot of entering new text, and then I tried taking notes on a presentation, which added more navigation and on-the-fly reorganization.
$20, free trial available
Pretty sweet, easy to use, pretty, simple look to it, surprising depth once you start to poke around in what you can attach and do with stuff. I dig it. The free trial is
The companion iOS apps look pretty good too, though I’m satisfied with iThoughts there.
I also don’t get why it’s MindNode Pro: wasn’t putting “Pro” on your app a thing you did when there were free and paid versions, like that phase where we named the paid versions “HD?”
Free, Pro version available
I liked this a lot, seemed to drain battery like crazy. The Plus and Pro versions are targeted towards businesses (Gantt View, only $99 more for a limited time), with the only thing you might need being some of the export tools. But you’re probably fine.
$15, free trial available
I love Scrivener, and I love how Literature & Latte runs their shop — my experiences with them over the years have been uniformly good.
Big difference between this and others is that Scapple defaults to a map without hierarchy: you’re not starting with a central topic and building out. Everything’s an island and then you like them up, toss a picture in, whatever. For that, it’s great, and the ease of use is good.
My problem is that for the stuff I most frequently use mind mapping for, I just could not seem to make it work fast enough: where I’m cruising along hitting tab/enter and typing as I go, Scapple never allowed me to get into that flow. I really wanted to like it more, and might return to it for doing writing brainstorming.
Mind Manager $350. Plus subscription stuff.
Targeted for the enterprise that can buy and negotiate licensing fees, I guess. I’ve played around with the free version, it seemed great and over-featured for personal use. But I don’t need another project management collaboration tool set, and I don’t have $350.
Free. Built in Java. It looks like dated open-source software for Windows 3.11, but seems to work okay. Requires Java. Java kills batteries dead.
You can of course take any sufficiently advanced graphing program and use it, or a plug-in, or whatever. My experience is that they’re too heavy, and actually way harder to use than XMind or MindNode.
Of them, OmniGraffle seemed the least-horrible option for mind-mapping, and is also a pretty great diagramming application in general. It is, however, $99 for the standard version and $199 for the business-y version with Visio support and fancier export options. On the plus side, OmniGroup are a great bunch of people, with amazing customer support.
“It is important to communicate to stakeholders that early calculations using velocity will be particularly suspect. For example, after a team has established some historical, data, it will be useful to say things like, “This team has an average velocity of 20, with a likely range of 15 to 25.” These numbers can then be compared to a total estimate of project size to arrive at a likely duration range for a project. A project comprising a total of 150 story points, for example, may be thought of as taking from 6 to 10 sprints if velocity historically rangers from 15 to 25.”
— Mike Cohn, Succeeding with Agile 2010.
This requires that you have the project’s scope and cost in story points clear at that point, which you won’t have. Story point estimation takes time to achieve any useful consistency, and especially to do the kind of learning (“last time we took something like this on, it was an 8, not a 3”) that gets you to numbers you might want to rely on.
It depends on scope being set — and how often is that true, particularly on agile projects where you’re able to show users the product at the end of each sprint and make adjustments?
More importantly, say these teams are relying on the release date of a project:
Where if you deliver early, that’s great, they have more time to rehearse, edit, and control over whether they want to release early.
If you’re late, they need to know as soon as possible if they need to start cutting schedule or scope, or spend more to keep those constant and meet a new deadline.
Then take Cohn’s example. If my company is lining everything up behind a date, and my status email is
“We have 150 story points remaining, and the team’s current velocity is 20 per two-week sprint, with a range of 15-25, so we should deliver between 20 weeks from now and 12 weeks, and probably around 16 weeks, which is on time.”
People would rightly stand up from their desks and hurl the nearest team-building trophy at me.
At the same time, putting more information about the probabilities and how you calculated them isn’t going to help. You can’t talk about the power law or the Hurst Exponent here. The question people want answered is: do I need to take action?
You’re going to end up reconciling this to some organizational standard, like:
Which actually means for PMs:
And for your customers, eventually means:
Then what do you say in your one-sentence “summary” at the top of your status report, the one thing people will read if they open it at all?
My best results came from straight team votes at the end of the sprint: ask “Do you feel confident we’ll hit our date?”
80% or more thumbs up? Green.
Under 50%? Red.
This requires the team trusts they can give their honest opinion and not be pressured. If that’s not the case, do the rest of the PM world a favor and leave our ranks.
Ask, get the number, and nod. Don’t argue, don’t cheer, just nod, and use that. Then in your status report, you write
“We’re green.” If you want, offer “(87% of team members believe we will meet our target date)”
Now, you do want to express the uncertainty, but in a way that people can use. Think of yourself as a weather forecaster. Do you tell people to bring umbrellas? Put on sunscreen?
Hurricane forecasting has a useful graphic for this:
Where on the calendar does your project land, and with what certainty? Assume that you ship every week on Friday. Why not
Distance from the calendar is how far out you are, and the shading indicates likeliness. Mark the release deadline with red (or a target, or…).
Daniel Worthington offered this:
Which is even better.
So two questions for you — 1) How do you effectively measure uncertainty on a project’s delivery date? 2) How do you convey that so it’s simple and easy to act on?
In which I whine about tools.
I’m sure for developers who work with Ruby the maze of tools, installs, and dependencies is like swimming in water to a fish. Except just from work, the fishes complain a lot about it too. For me, though, it’s so fucked up I want to throw something at it.
Let’s say I want to install Ruby and Rails on my Mac.
% brew install ruby
% gem install rails
(bombs on crazy UnknownHost error)
% gem install rails
(vwhoosh! installs a ton of stuff)
% rails –version
Rails is not currently installed on this system. To get the latest version, simply type:
$ sudo gem install rails
You can then rerun your “rails” command.
Okay, so let’s figure out how to do that. Stack Overflow answer says this is a command line tools issue. Do I have the command line tools? I’ll check — Crap, it returns a “Can’t install software because it is not currently available in the Software Update server” error. That makes no goddamn sense.
Back to Stack Overflow… ah! That’s a bug. I guess I’m actually fine. BLARGH.
Except rails will install and then tells me it didn’t. Still haven’t figured that one out.
And hey — as in Python, there’s a huge set of version differences!
Q: Hey, I’m a normal dude trying to get this thing up, which version of Ruby do I —
Q: Thanks for that.
To deal with all these dependencies, I can use rvm which… I don’t even know! Woo!
This seriously makes me want to go back to wrestling with Python. Or set fire to my computer.
Dungeon Keeper’s catching flak for its terribleness, and today’s story is their new implementation of review skimming.
Check out their writeup.
Here’s their picture of where this goes beyond anything I’d seen before:
Where, say, OkCupid asks “love us?” and “leave feedback” to try and skim off only the people who’ll leave good reviews, Dungeon Keeper’s explicitly asking you what your star rating would be, as if you’re rating it right then (in a window labelled “Rate Your Experience”)(!).
Logical next step would be to present a fake Android rating screen that discards 1-4 star reviews and then submits 5-star reviews on the user’s behalf.
At the other end, you find blatant harassment and tricky language meant to confuse users into capitulating. Modal alert panels might interrupt a user’s workflow at inopportune times, demanding that they either leave a review now or be reminded later to do so.
Part of the frustration is much deeper than that, and goes to a deeply scummy tactic Apple’s let proliferate. I’m going to call this skimming reviews: you pop up a request for a dialogue, but in a way that encourages only people who are going to leave good ones to do it. OkCupid’s app is the clearest example of this:
Check out the levels of ridiculousness:
Right? As much as the cumulative annoyance of “rate us” requests might grind us down, this kind of thing is way more toxic and makes the whole experience seem dirty to me, because it’s not even a sincere request for feedback, but an attempt to turn users into positive reviews.
I don’t understand why this passes App Store review.
I did some napkin math on the overall effect of the new version of Mavericks, and wanted to share. Standard caveats apply: as you’ll see, I’m doing a lot of hand-waving.
So! Thirty million Macs, idling, draw 195 megawatts. Under load, they draw 2,380 MW.
Assume Mavericks gets you a 15% savings in power both at idle and under load.*** If all Macs are at idle, you’re saving 30MW. If all the Macs are running at the same time, you’re saving 357MW.
How much is that?
Now, you don’t of course get all the savings at once, and I’m totally omitting how Mavericks affects machines staying in minimal power draw states longer instead of waking, working, sleeping, waking…
I welcome thoughts on how to improve my napkin math and get to a better number.
Even as a rough guess, 80% of a coal plant… that’s pretty awesome.
* I assume this will be low, as new Macs will come with Mavericks installed and replace ones that don’t, and also because with power-saving features, there’s a huge incentive for laptop users to upgrade if they are able but haven’t. Also, it’s free.
* nuke cite http://www.eia.gov/tools/faqs/faq.cfm?id=104&t=3
*** I’m guessing, based on anecdotal reports of battery life improvements, mostly during betas