The Scoop

  • Home
  • Projects
  • About The Scoop
  • Fixing Journalism
  • Departments
    • Apple
    • Asides
    • Broadcast
    • Campaign Finance
    • Car Tools
    • Data
    • DIY
    • django
    • Fed Data
    • FOIA
    • General
    • IRE
    • Journalism
    • Local Data
    • Mapping
    • Miscellany
    • NonGov Data
    • Online
    • Paper Trail
    • Presentations
    • Public Records
    • Python
    • Rails
    • SLA
    • Social Network Analysis
    • Sports
    • State Data
    • Teaching
    • Work
    • XML
  • Subscribe via RSS

The Hidden Appeal of GeoDjango

August 23rd, 2008  |  Published in Mapping, django | Comments (0)

One of my tasks this summer was to learn a bit more about the GIS branch of Django, which is now in trunk as a contrib app thanks to the hard work of Justin Bronn, Travis Pinney and Co. Although there are many folks in the CAR community who are quite proficient at mapping technologies, I’m not one of them. I’ve taken a few mapping classes at various conferences over the years, but I haven’t really had a project that required me to use something like ArcView.

But lately I’ve been using GeoDjango for a work-related project (nothing to see yet, but fingers crossed) and it has been a real joy. That’s because its developers, like those of Django itself, have done a lot of work to make things seem pretty effortless. Case in point, and perhaps one of the GIS app’s strongest selling points, is the ease with which you can import spatial data into an app and make use of it. It’s called LayerMapping, and it’s relatively hidden-away in the “Extra Features” section of the GeoDjango wiki.

Here’s how it works: say you’ve got a shapefile (one of the most common spatial formats, particularly when it comes to government-created datafiles) and you want to use it with Django. LayerMapping, along with GeoDjango’s DataSource utility, can help you see what your models might need to look like and then, after you’ve created them, just one dictionary mapping the spatial data columns to your model is all that’s needed to get your data into your database (preferably Postgres). Underneath, the open source GDAL library is doing the heavy lifting, but as with most things Django, you don’t feel the pain at all.

And then it gets better, since you can simply use the Django ORM to access your spatial data, as demonstrated by one of the wiki examples:

qs2 = District.objects.filter(poly__contains='POINT(-95.362293 29.756539)')

Seems almost too good to be true, but it is. And it makes the occasional pain of installation well worth it. Considering that it’s now a standard part of django.contrib, there’s not much excuse for folks curious about using GIS data not to give it a try.

Six Reasons To Look Past Caspio

August 18th, 2008  |  Published in Journalism | Comments (10)

Be sure to catch Caspio’s David Milliron’s responses at Mindy’s site.

Mindy asks for some bullet points on why news organizations would do better to not use Caspio for their Web database needs. Feel free to add on:

  1. SEO. If you like building databases that are not indexed by Google and other search engines, then Caspio’s right for you. Go ahead, Google “Powered by Caspio“.
  2. Owning vs. Renting. You will never stop paying for Caspio unless you quit it entirely. And then you’ll still need to rewrite your apps. All you’ve gained is more work.
  3. You will need programming. Caspio says “no more programming,” but to do anything beyond basic search and display, you will need some. Oh, but you can’t get access to that functionality during a free trial.
  4. Like using Flash? Caspio doesn’t.
  5. Nickel and Dime. Zip code searching costs $150 to setup and $50 a month.
  6. As my boss and friend Aron says, “We can’t outsource our future.” By choosing Caspio, you’re dependent upon them to add features, and while they do, they add them for all users, too. So much for differentiation.

And here’s a bonus quote from Jacob Kaplan-Moss, one of Django’s lead developers, who admittedly has a bias in this area. But still, it’s a very telling quote: “I’ve actually stopped being all that concerned about Caspio: each new Caspio customer is one more competitor my paper doesn’t have to worry about.”

Now, I’m sure there are six reasons to use Caspio, but I don’t think they stack up in the long term. I think they leave you with more work, not less, and with apps that you have to spend valuable time making look different from everybody else who uses Caspio.

Fumblerooski

August 9th, 2008  |  Published in Sports, django | Comments (6)

For reference purposes, you may want to study this old commercial for Reese’s Peanut Butter Cups. Recommended, but not necessary, is this definition.

It’s August, which means that college football is just around the corner. College football is why I don’t volunteer to teach any classes in the fall. It’s why I occasionally compensate my better half for missed Saturday afternoons (although thanks to ESPN 360, I’m not nearly as bad as I was pre-child). So I love college ball, and I love data. That’s where Fumblerooski comes in.

Let me say from the get-go that this is not nearly a finished site. It’s not even halfway there. I’m posting about it now because I’d like to invite people with similar interests to help me build out a site that puts the numbers behind college football front and center. Yes, I have ideas - APIs, for example - but alone Fumblerooski will only ever be so good, and certainly not good enough. That’s why the code behind the site is on github.

The basics: it’s running Django trunk (so, yes, that’s the 1.0 beta candidate right now) and uses MySQL as a backend. Right now I have game results dating back to 1987 for most major schools and spottier coverage for minor ones. In addition, the NCAA releases game-by-game statistics for players and I have some scripts for processing that data, although there’s plenty of room for improvement. Folks who dive into the code may also notice that I started a recruiting dataset as well, but I think that area is well-covered, so it’s not a priority for me at this time. At the moment, Fumblerooski is running on a Joyent 1/2 gig Accelerator with nginx as the Web server.

Most of my work so far as gone towards building out team information. Take my alma mater, Pittsburgh: you can see the results of a given season, check out a series (you can reverse it if you’re one of those WVU fans) or see details of an individual game. The drive chart, which is a fairly recent NCAA addition, is dynamically fetched (and no, I can’t do anything about the colors).

I envision at least two types of contributors: one would help on the coding side with new features (I have plans for aggregate player stuff, but want to wait to see what gets into Django). Another type could be with information: fleshing out coaching details, for example. In my wildest dreams, Fumblerooski gets a severely-needed makeover as well. Any takers? Feel free to sign up at github, or fork the code, or whatever. You can also contact me if you’d like to help in other ways.

Oh, and the name? It was the best football-only term available, but I also got the blessing of Nebraska alum Matt Waite.

The Birth of Quadruplets, or Understanding the Process

July 22nd, 2008  |  Published in Journalism | Comments (0)

My friend Dave Gulliver had a fascinating piece in his paper on Sunday about the birth of quadruplets in a Sarasota hospital. It’s a great story, but what makes it greater is that it was written by somebody with a certain amount of expertise on the subject of difficult premature multiple births. I hope Dave doesn’t mind, but I’d like to use that story as an example of why understanding the use of data is increasingly important for large swaths of journalism.

There’s a tendency among some folks in the industry to see CAR and other technological tools as just that - blunt instruments. Helpful, sure, but not ultimately necessary to the task of creating journalism. And for a segment of what journalism does, that’s probably ok. When we report on people and institutions that aren’t using technology to guide their decisions or actions, then an understanding of how data is used or certain technologies isn’t a necessity.

I suppose a music critic needn’t understand much about databases, for example, but reporters covering government, business, college or professional sports, to name a few, should be able to assess their subjects the way that people inside those sectors do. And increasingly, that means understanding the use of data. Many local governments base their police staffing - who covers where - on a non-stop flow of crime data. Sports teams pour over tape, logging their opponents’ tendencies in preparation for upcoming games. Businesses are all about the numbers, too.

And then there’s politics. Winning elections these days is very often about putting together enough voters to crack 50%. There’s microtargeting based on consumer data and door-to-door canvassing so that volunteers can input demographic data into centralized servers. They’re not doing that just for fun - it’s valuable information. But if journalists can’t really grasp how organizations are using data, we’re liable to miss the effects, and thus miss some fuller explanations of events. Yes, we can rely on people to tell us what’s happening - and we should - but if data plays a big part in the life of an organization, the reporter covering it should have some basis to evaluate that role.

So how does that relate to Dave’s story about the quads? Well, after reading it, I noticed that there were some subtle bits of detail that I never would have thought to include or been able to describe as well - about how the NICU operates, the details of the births. That’s because Dave has been there with his twin boys. A parent of a child born without complications or a single person would have been hard-pressed to write as good a story. I sure wouldn’t have been able to do so.

It’s the same idea when it comes to understanding the basis for decisions that come from, at least in part, the collection and consumption of data. It’s can mean the difference between telling a story and telling a better story. I’m sure plenty of organizations that we cover would be happy to have reporters who are in the dark about these things. But that doesn’t help our readers any.

So, technology and data as a tool? Yes. But when the tools become a crucial part of the world we cover, understanding how they work and being able to use them makes us better journalists.

DjangoCon

July 20th, 2008  |  Published in django | Comments (0)

I donated to Django

The first-ever DjangoCon will be held Sept. 6-7 at the Googleplex in Mountain View, Calif. The preliminary program looks incredible, and I’m sad to be missing it. My summer travels have been plenty and another West Coast trip, especially over a weekend, is a bit too much (there’s also the nagging point that I’d have to pay for it myself!). Matt Waite will be there, on a panel discussing Django in journalism, just one of the really strong sessions. If you’re a West Coast CAR person dabbling in frameworks, it’s worth checking out.

But I’m trying to do my part, beginning with a donation to the Django Software Foundation. Doing so will help pay for conferences like DjangoCon, sprints and other activities that help improve the framework, and it’s such a small thing to do considering the benefits I’ve realized from using Django. If you feel the same way, please think about supporting Django.

Previously


Aug 18, 2008
Six Reasons To Look Past Caspio

by Derek | Read | 10 Comments

Be sure to catch Caspio’s David Milliron’s responses at Mindy’s site.
Mindy asks for some bullet points on why news organizations would do better to not use Caspio for their Web database needs. Feel free to add on:

SEO. If you like building databases that are not indexed by Google and other search engines, then Caspio’s right [...]


Aug 9, 2008
Fumblerooski

by Derek | Read | 6 Comments

For reference purposes, you may want to study this old commercial for Reese’s Peanut Butter Cups. Recommended, but not necessary, is this definition.
It’s August, which means that college football is just around the corner. College football is why I don’t volunteer to teach any classes in the fall. It’s why I occasionally compensate my better [...]


Jul 22, 2008
The Birth of Quadruplets, or Understanding the Process

by Derek | Read | No Comments

My friend Dave Gulliver had a fascinating piece in his paper on Sunday about the birth of quadruplets in a Sarasota hospital. It’s a great story, but what makes it greater is that it was written by somebody with a certain amount of expertise on the subject of difficult premature multiple births. I hope Dave [...]


Jul 20, 2008
DjangoCon

by Derek | Read | No Comments

The first-ever DjangoCon will be held Sept. 6-7 at the Googleplex in Mountain View, Calif. The preliminary program looks incredible, and I’m sad to be missing it. My summer travels have been plenty and another West Coast trip, especially over a weekend, is a bit too much (there’s also the nagging point that I’d have [...]


Jun 29, 2008
Caspio’s Lessons

by Derek | Read | 6 Comments

Been awhile since I wrote about Caspio, and since then they’ve only gained more media clients, which I suppose could be a lesson for me. But I think not. Rather, I hope what we’ll see in the months and years to come are the lessons that Matt Wynn offers from his experiences using Caspio. Here’s [...]


Jun 19, 2008
The Future of News Libraries

by Derek | Read | 2 Comments

At the recently-completed SLA conference in Seattle, Nora Paul led a session on the “future of news libraries” that asked the attendees to imagine 2012, when librarians (or news researchers, or whatever you want to call them) are recognized as leaders of innovation in newsrooms, and then to explain how that came to pass. It [...]

About The Scoop

Derek Willis’ weblog on investigative and computer-assisted reporting.

Recent Comments

  • Scot Hacker on Six Reasons To Look Past Caspio
  • Dan D. Gutierrez on Six Reasons To Look Past Caspio
  • The AllYourtv.com Local News Blog » Six Reasons To Look Past Caspio on Six Reasons To Look Past Caspio
  • Justin Lilly on Six Reasons To Look Past Caspio
  • Derek on Six Reasons To Look Past Caspio

Recent Posts

  • The Hidden Appeal of GeoDjango
  • Six Reasons To Look Past Caspio
  • Fumblerooski
  • The Birth of Quadruplets, or Understanding the Process
  • DjangoCon

Contributors

  • Derek
  • Matt

Popular

  • Methadone Overdose Deaths
  • The Times
  • On Bomb-Throwing
  • Outsourcing Database Development, or the Caspio Issue
  • Trial By Caspio
  • Joyce Meyer Ministry Compensation
  • The Original (and Future?) Facebook
  • Django, iCal and vObject
  • Teaching Data on the Web
  • EveryBlock and the Definition of News
  • Around the Site

    • Home
    • About
    • Projects
    • Fixing Journalism
    • Database of CAR Stories
  • Methods

    • Fanueil Media
    • Open
    • Institute for Analytic Journalism
    • CAR in Canada
    • IRE
    • MacDevCenter
    • ONLamp.com
    • Planet MySQL
    • Poynter
    • Resource Shelf
  • People

    • Mark Schaver
    • Jeremy Zawodny
    • Liz Donovan
    • Shannan Bowen
    • Matt Wynn
    • Chase Davis
    • Adrian Holovaty
    • Joe Adams
    • Matt Waite
    • Mike Hillyer
    • Mark Hamilton
    • William P. Hartnett


  • ©2008 The Scoop
    Powered by WordPress using the Gridline Lite theme by Graph Paper Press.