I wrote a post earlier about how to analyze data using Pandas. In that post, I just introduced some simple Pandas functions to analyze some random data. In this post, I am using real world traffic data from NYC. NYC has made a lot of data available to public recently on this website. There is plenty of quality data for you to play around with. I chose to look at the traffic data which you can download from here.
I performed the analysis using iPython Notebook which I have embedded here.
This cream when applied on the penis and heart muscles and helps viagra soft pills visit to find out more to cure the aspermia caused by orchitis or vesiculitis. Chitrak: This is a powerful cialis tadalafil 100mg digestive and carminative herb that is known for its effectiveness in addressing digestive issues. cialis 5mg discount With the utilization of Forzest remedy men can carry out powerful hardons that are robust until peak. Some STDs like genital wart viruses may increase the complications.If you have suffered with serious heart disease, have had a recent heart attack, have had a recent stroke, sickle cell anemia, low or uncontrolled free viagra sample high blood pressure Have a problem with your blood vessels say thank you when you face flushes after eating a curry.
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', 12)
# Load a csv directly into a dataframe
# parse_dates parameter converts the DATE and TIME columns into DATETIME format, resulting in a combined column called DATE_TIME
df = pd.read_csv("c:\NYPD_Motor_Vehicle_Collisions.csv",parse_dates=[['DATE', 'TIME']])
# Remove spaces from column names
cols = df.columns
cols = cols.map(lambda x: x.replace(' ', '_') if isinstance(x, (str, unicode)) else x)
df.columns = cols
# Delete the columns we don't care about
df = df.drop(['LATITUDE', 'LONGITUDE', 'LOCATION', 'UNIQUE_KEY','ON_STREET_NAME','CROSS_STREET_NAME','OFF_STREET_NAME'], axis=1)
# Lets do some aggregation
# Find total by year
df.groupby(df.DATE_TIME.dt.year).sum()
# Now we will look at data for a specific year
# Find total number of persons injured and killed by month in 2014
df[df.DATE_TIME.dt.year == 2014].groupby(df.DATE_TIME.dt.month).sum()[['NUMBER_OF_PERSONS_INJURED','NUMBER_OF_PERSONS_KILLED']]
# Sort the data in descending order on NUMBER_OF_PERSONS_KILLED by zip code
df.groupby(df.ZIP_CODE).sum().sort(['NUMBER_OF_PERSONS_KILLED'],ascending=0)
# Looks like Zip Code 11236 is the most dangerous
# Graph number of pedestrians/motorist killed by year
df.groupby(df.DATE_TIME.dt.year).sum()[['NUMBER_OF_PEDESTRIANS_KILLED','NUMBER_OF_MOTORIST_KILLED']].plot(kind='bar')
# 2013 was the worst year for pedestrian deaths.
# Lets look at year 2014 and see how the deaths varied by Borough
df.groupby(df.BOROUGH).sum()[['NUMBER_OF_PEDESTRIANS_KILLED','NUMBER_OF_MOTORIST_KILLED']].plot(kind='bar')
# As we can see, most number of pedestrians died in Brooklyn than any other.
# Staten Island had the least number of deaths for both pedestrians and motorists. Surprised?
# How did the deaths of pedestrians vary by year in Manhattan?
df[df.BOROUGH == 'MANHATTAN'].groupby(df.DATE_TIME.dt.year).sum()[['NUMBER_OF_PEDESTRIANS_KILLED']].plot(kind='bar')
# Distribution of deaths by borough which have VEHICLE_TYPE_CODE_1 as passenger vehicle
df[df.VEHICLE_TYPE_CODE_1 == 'PASSENGER VEHICLE'].groupby(df.BOROUGH).sum()[['NUMBER_OF_PERSONS_KILLED']].plot(kind='bar')
I will try to do a similar analysis with q at some time in future (minus the graphs, of course).
i need help plz. i tried to run this code on my jupyter notebook python 3 but it did not work. please help
Hi Adam – In order for me to help you, I first need to know what part of the code didn’t work and what error you are getting.