Studi Case: Bike Users Analysis

Bike Users Analysis

Latar Belakang

Cyclistic adalah perusahaan (fictional) penyewaan sepeda di Chicago, Amerika Serikat. Cyclistic mengoperasikan 5,824 sepeda di 692 stasiun di Chicago. Pengguna dapat memilih 3 opsi pembayaran: single-ride passes, full-day passes, dan annual memberships. Pengguna single-ride passes atau full-day passes disebut casual users, sementara yang membayar iuran keanggotaan untuk setahun disebut member. Bagian Pemasaran hendak meningkatkan pendapatan perusahaan melalui strategi konversi casual user menjadi member. Untuk itu diperlukan analisis data perilaku users –perbedaan perilaku members dan casual users– dalam penggunaan layanan yang disediakan Cyclistic.

Dengan memahami perilaku users, perusahaan dapat memutuskan strategi pemasaran yang tepat untuk mengonversi casual users menjadi member.

Step 1: Persiapan Lingkungan kerja dan Mengunggah data ke R

Untuk menjawab pertanyaan di atas, data yang digunakan dalam kasus fictional ini adalah data yang dimiliki City of Chicago’s (“City”) Divvy bicycle sharing service. Data yang tersedia adalah 2 set data penggunaan sepeda, yaitu 1Q2019 dan 1Q2020.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)

Membaca data csv menggunakan read.csv dan menyimpan dalam data frame baru dengan nama trips_2019_q1 dan trips_2020_q1.

trips_2019_q1 <- read.csv("Divvy_Trips_2019_Q1.csv")
trips_2020_q1 <- read.csv("Divvy_Trips_2020_Q1.csv")

Step 2: Menggabungkan dua data frame menjadi satu data frame baru.

2.1. membandingkan variable dua data frame

colnames(trips_2019_q1)
##  [1] "trip_id"           "start_time"        "end_time"         
##  [4] "bikeid"            "tripduration"      "from_station_id"  
##  [7] "from_station_name" "to_station_id"     "to_station_name"  
## [10] "usertype"          "gender"            "birthyear"
colnames(trips_2020_q1)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
str(trips_2019_q1)
## 'data.frame':    365069 obs. of  12 variables:
##  $ trip_id          : int  21742443 21742444 21742445 21742446 21742447 21742448 21742449 21742450 21742451 21742452 ...
##  $ start_time       : chr  "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
##  $ end_time         : chr  "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
##  $ bikeid           : int  2167 4386 1524 252 1170 2437 2708 2796 6205 3939 ...
##  $ tripduration     : chr  "390" "441" "829" "1,783.00" ...
##  $ from_station_id  : int  199 44 15 123 173 98 98 211 150 268 ...
##  $ from_station_name: chr  "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ to_station_id    : int  84 624 644 176 35 49 49 142 148 141 ...
##  $ to_station_name  : chr  "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ usertype         : chr  "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
##  $ gender           : chr  "Male" "Female" "Female" "Male" ...
##  $ birthyear        : int  1989 1990 1994 1993 1994 1983 1984 1990 1995 1996 ...
str(trips_2020_q1)
## 'data.frame':    426887 obs. of  13 variables:
##  $ ride_id           : chr  "EACB19130B0CDA4A" "8FED874C809DC021" "789F3C21E472CA96" "C9A388DAC6ABF313" ...
##  $ rideable_type     : chr  "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
##  $ started_at        : chr  "2020-01-21 20:06:59" "2020-01-30 14:22:39" "2020-01-09 19:29:26" "2020-01-06 16:17:07" ...
##  $ ended_at          : chr  "2020-01-21 20:14:30" "2020-01-30 14:26:22" "2020-01-09 19:32:17" "2020-01-06 16:25:56" ...
##  $ start_station_name: chr  "Western Ave & Leland Ave" "Clark St & Montrose Ave" "Broadway & Belmont Ave" "Clark St & Randolph St" ...
##  $ start_station_id  : int  239 234 296 51 66 212 96 96 212 38 ...
##  $ end_station_name  : chr  "Clark St & Leland Ave" "Southport Ave & Irving Park Rd" "Wilton Ave & Belmont Ave" "Fairbanks Ct & Grand Ave" ...
##  $ end_station_id    : int  326 318 117 24 212 96 212 212 96 100 ...
##  $ start_lat         : num  42 42 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num  42 42 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...

2.2. menyeragamkan nama variable; mengubah nama variable pada data frame trips_2019_q1 agar sama dengan trips_2020_q1)

(trips_2019_q1 <- rename(trips_2019_q1, 
                         ride_id = trip_id,
                         rideable_type = bikeid,
                         started_at = start_time,
                         ended_at = end_time,
                         start_station_id = from_station_id,
                         start_station_name = from_station_name,
                         end_station_name = to_station_name,
                         end_station_id = to_station_id,
                         member_casual = usertype))

2.3. mengecek kembali hasil rename menggunakan str

str(trips_2019_q1)
## 'data.frame':    365069 obs. of  12 variables:
##  $ ride_id           : int  21742443 21742444 21742445 21742446 21742447 21742448 21742449 21742450 21742451 21742452 ...
##  $ started_at        : chr  "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
##  $ ended_at          : chr  "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
##  $ rideable_type     : int  2167 4386 1524 252 1170 2437 2708 2796 6205 3939 ...
##  $ tripduration      : chr  "390" "441" "829" "1,783.00" ...
##  $ start_station_id  : int  199 44 15 123 173 98 98 211 150 268 ...
##  $ start_station_name: chr  "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ end_station_id    : int  84 624 644 176 35 49 49 142 148 141 ...
##  $ end_station_name  : chr  "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ member_casual     : chr  "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
##  $ gender            : chr  "Male" "Female" "Female" "Male" ...
##  $ birthyear         : int  1989 1990 1994 1993 1994 1983 1984 1990 1995 1996 ...
str(trips_2020_q1)
## 'data.frame':    426887 obs. of  13 variables:
##  $ ride_id           : chr  "EACB19130B0CDA4A" "8FED874C809DC021" "789F3C21E472CA96" "C9A388DAC6ABF313" ...
##  $ rideable_type     : chr  "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
##  $ started_at        : chr  "2020-01-21 20:06:59" "2020-01-30 14:22:39" "2020-01-09 19:29:26" "2020-01-06 16:17:07" ...
##  $ ended_at          : chr  "2020-01-21 20:14:30" "2020-01-30 14:26:22" "2020-01-09 19:32:17" "2020-01-06 16:25:56" ...
##  $ start_station_name: chr  "Western Ave & Leland Ave" "Clark St & Montrose Ave" "Broadway & Belmont Ave" "Clark St & Randolph St" ...
##  $ start_station_id  : int  239 234 296 51 66 212 96 96 212 38 ...
##  $ end_station_name  : chr  "Clark St & Leland Ave" "Southport Ave & Irving Park Rd" "Wilton Ave & Belmont Ave" "Fairbanks Ct & Grand Ave" ...
##  $ end_station_id    : int  326 318 117 24 212 96 212 212 96 100 ...
##  $ start_lat         : num  42 42 41.9 41.9 41.9 ...
##  $ start_lng         : num  -87.7 -87.7 -87.6 -87.6 -87.6 ...
##  $ end_lat           : num  42 42 41.9 41.9 41.9 ...
##  $ end_lng           : num  -87.7 -87.7 -87.7 -87.6 -87.6 ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...

Tampak bahwa terdapat perbedaan format antara trips_2019_q1 dan trips_2020_q1, berupa: 1. perbedaan format pada kolom ride_id dan rideable_type di mana pada data type pada trips_2019_q1 berupa integer sementara pada trips_2020_q1 berupa karakter. 2. Perbedaan isi variable member_casual di mana trips_2019_q1 menggunakan “member” and “Subscriber”, sementara trips_2020_q1 menggunakan “Customer” dan “Casual”.

Persoalan pertama (perbedaan format) diperbaiki sebelum penggabungan dua dataframe. Sementara perbedaan penulisan tipe keanggotaan diperbaiki setelah penggabungan data.

2.4. Penyeragaman Format Data: ubah format data pada trips_2019_q1 menjadi character

trips_2019_q1 <- mutate(trips_2019_q1, ride_id = as.character(ride_id),
                        rideable_type = as.character(rideable_type))

2.5. Pengabungan (stack) data frame menjadi data frame baru “alltrips” menggunakan fungsi bind_rows

alltrips <- bind_rows(trips_2019_q1, trips_2020_q1)

2.6. menghapus variable yang hanya terdapat di dalam satu date frame.

Variable start_lat, start_lng, end_lat, dan end_lng hanya terdapat di dataframe trips_2020_q1; sementara variable tripduration, gender, dan birthyear hanya terdapat dalam dataframe tirps_2019_q1.

alltrips <- alltrips %>%
  select(-c(start_lat,  start_lng, end_lat,end_lng, tripduration, gender, birthyear ))

Step 3: Data Cleaning dan Persiapan Analisis

3.1. memeriksa data frame hasil merger (alltrips)

daftar kolom

colnames(alltrips)
## [1] "ride_id"            "started_at"         "ended_at"          
## [4] "rideable_type"      "start_station_id"   "start_station_name"
## [7] "end_station_id"     "end_station_name"   "member_casual"

Jumlah rows atau observasi

nrow(alltrips)
## [1] 791956

Dimensi data frame

dim(alltrips)
## [1] 791956      9

Memeriksa 6 baris awal data frame

head(alltrips)
##    ride_id         started_at           ended_at rideable_type start_station_id
## 1 21742443 2019-01-01 0:04:37 2019-01-01 0:11:07          2167              199
## 2 21742444 2019-01-01 0:08:13 2019-01-01 0:15:34          4386               44
## 3 21742445 2019-01-01 0:13:23 2019-01-01 0:27:12          1524               15
## 4 21742446 2019-01-01 0:13:45 2019-01-01 0:43:28           252              123
## 5 21742447 2019-01-01 0:14:52 2019-01-01 0:20:56          1170              173
## 6 21742448 2019-01-01 0:15:33 2019-01-01 0:19:09          2437               98
##                    start_station_name end_station_id
## 1              Wabash Ave & Grand Ave             84
## 2              State St & Randolph St            624
## 3                Racine Ave & 18th St            644
## 4      California Ave & Milwaukee Ave            176
## 5 Mies van der Rohe Way & Chicago Ave             35
## 6          LaSalle St & Washington St             49
##                 end_station_name member_casual
## 1      Milwaukee Ave & Grand Ave    Subscriber
## 2 Dearborn St & Van Buren St (*)    Subscriber
## 3  Western Ave & Fillmore St (*)    Subscriber
## 4              Clark St & Elm St    Subscriber
## 5        Streeter Dr & Grand Ave    Subscriber
## 6        Dearborn St & Monroe St    Subscriber
tail(alltrips)
##                 ride_id          started_at            ended_at rideable_type
## 791951 6F4D221BDDFD943F 2020-03-10 10:40:27 2020-03-10 10:40:29   docked_bike
## 791952 ADDAA33CEBCAE733 2020-03-10 10:40:06 2020-03-10 10:40:07   docked_bike
## 791953 82B10FA3994BC66A 2020-03-07 15:25:55 2020-03-07 16:14:03   docked_bike
## 791954 AA0D5AAA0B59C8AA 2020-03-01 13:12:38 2020-03-01 13:38:29   docked_bike
## 791955 3296360A7BC20FB8 2020-03-07 18:02:45 2020-03-07 18:13:18   docked_bike
## 791956 064EC7698E4FF9B3 2020-03-08 13:03:57 2020-03-08 13:32:27   docked_bike
##        start_station_id        start_station_name end_station_id
## 791951              675                     HQ QR            675
## 791952              675                     HQ QR            675
## 791953              161     Rush St & Superior St            240
## 791954              141    Clark St & Lincoln Ave            210
## 791955              672 Franklin St & Illinois St            264
## 791956              110     Dearborn St & Erie St             85
##                    end_station_name member_casual
## 791951                        HQ QR        casual
## 791952                        HQ QR        casual
## 791953 Sheridan Rd & Irving Park Rd        member
## 791954    Ashland Ave & Division St        casual
## 791955 Stetson Ave & South Water St        member
## 791956        Michigan Ave & Oak St        casual

daftar kolom disertai tipe data

str(alltrips)
## 'data.frame':    791956 obs. of  9 variables:
##  $ ride_id           : chr  "21742443" "21742444" "21742445" "21742446" ...
##  $ started_at        : chr  "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
##  $ ended_at          : chr  "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
##  $ rideable_type     : chr  "2167" "4386" "1524" "252" ...
##  $ start_station_id  : int  199 44 15 123 173 98 98 211 150 268 ...
##  $ start_station_name: chr  "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ end_station_id    : int  84 624 644 176 35 49 49 142 148 141 ...
##  $ end_station_name  : chr  "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ member_casual     : chr  "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...

Ringkasan statistik data

summary(alltrips)
##    ride_id           started_at          ended_at         rideable_type     
##  Length:791956      Length:791956      Length:791956      Length:791956     
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_id start_station_name end_station_id  end_station_name  
##  Min.   :  2.0    Length:791956      Min.   :  2.0   Length:791956     
##  1st Qu.: 77.0    Class :character   1st Qu.: 77.0   Class :character  
##  Median :174.0    Mode  :character   Median :174.0   Mode  :character  
##  Mean   :204.4                       Mean   :204.4                     
##  3rd Qu.:291.0                       3rd Qu.:291.0                     
##  Max.   :675.0                       Max.   :675.0                     
##                                      NA's   :1                         
##  member_casual     
##  Length:791956     
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
## 

3.2. Data Cleaning

3.2.1. Menyeragamkan kategori user menjadi member dan casual.

Pada dataframe terdapat 4 kategori: member/Customer dan Casual/Subscriber.

table(alltrips$member_casual)
## 
##     casual   Customer     member Subscriber 
##      48480      23163     378407     341906

mengubah Customer menjadi member dan Subscriber menjadi Casual.

alltrips <-  alltrips %>% 
  mutate(member_casual = recode(member_casual
                                ,"Subscriber" = "member"
                                ,"Customer" = "casual"))

mengecek hasilnya

table(alltrips$member_casual)
## 
## casual member 
##  71643 720313

3.2.2. menambahkan kolom agar mendapatkan data jumlah penggunaan sepeda per hari dalam sepekan.

Data jumlah penggunaan sepeda per hari yang berbeda-beda tidak dapat dihasilkan dari data yang ada. Untuk itu kita perlu menambahkan sejumlah kolom: tahun, bulan, tanggal, dan hari yang diperoleh dari data “started_at”

menambahkan kolom tanggal tanpa satuan waktu

alltrips$date <- as.Date(alltrips$started_at)

menambahkan kolom month yang diturunkan dari kolom date

alltrips$month <- format(as.Date(alltrips$date), "%m")

menambahkan kolom day yang diturunkan dari kolom date

alltrips$day <- format(as.Date(alltrips$date), "%d")

menambahkan kolom year yang diturunkan dari kolom date

alltrips$year <- format(as.Date(alltrips$date), "%Y")

menambahkan kolom day_of_week yang diturunkan dari kolom date

alltrips$day_of_week <- format(as.Date(alltrips$date), "%A")

memeriksa hasilnya

str(alltrips)
## 'data.frame':    791956 obs. of  14 variables:
##  $ ride_id           : chr  "21742443" "21742444" "21742445" "21742446" ...
##  $ started_at        : chr  "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
##  $ ended_at          : chr  "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
##  $ rideable_type     : chr  "2167" "4386" "1524" "252" ...
##  $ start_station_id  : int  199 44 15 123 173 98 98 211 150 268 ...
##  $ start_station_name: chr  "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ end_station_id    : int  84 624 644 176 35 49 49 142 148 141 ...
##  $ end_station_name  : chr  "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
##  $ date              : Date, format: "2019-01-01" "2019-01-01" ...
##  $ month             : chr  "01" "01" "01" "01" ...
##  $ day               : chr  "01" "01" "01" "01" ...
##  $ year              : chr  "2019" "2019" "2019" "2019" ...
##  $ day_of_week       : chr  "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...

3.2.3. menghitung waktu tempuh

Untuk menghitung waktu tempuh, dibutuhkan kolom yang merupakan hasil perhitungan selisih antara waktu berangkat (“started_at”) dan waktu tiba (“ended_at”) menggunakan fungsi difftime.

alltrips$ride_length <- difftime(alltrips$ended_at,alltrips$started_at)

mengecek hasilnya

str(alltrips)
## 'data.frame':    791956 obs. of  15 variables:
##  $ ride_id           : chr  "21742443" "21742444" "21742445" "21742446" ...
##  $ started_at        : chr  "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
##  $ ended_at          : chr  "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
##  $ rideable_type     : chr  "2167" "4386" "1524" "252" ...
##  $ start_station_id  : int  199 44 15 123 173 98 98 211 150 268 ...
##  $ start_station_name: chr  "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ end_station_id    : int  84 624 644 176 35 49 49 142 148 141 ...
##  $ end_station_name  : chr  "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
##  $ date              : Date, format: "2019-01-01" "2019-01-01" ...
##  $ month             : chr  "01" "01" "01" "01" ...
##  $ day               : chr  "01" "01" "01" "01" ...
##  $ year              : chr  "2019" "2019" "2019" "2019" ...
##  $ day_of_week       : chr  "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...
##  $ ride_length       : 'difftime' num  390 441 829 1783 ...
##   ..- attr(*, "units")= chr "secs"

mengubah tipe data ride_length dari factor ke numerical

is.factor(alltrips$ride_length)
## [1] FALSE
alltrips$ride_length <- as.numeric(as.character(alltrips$ride_length))
is.numeric(alltrips$ride_length)
## [1] TRUE

mengecek hasilnya

str(alltrips)
## 'data.frame':    791956 obs. of  15 variables:
##  $ ride_id           : chr  "21742443" "21742444" "21742445" "21742446" ...
##  $ started_at        : chr  "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
##  $ ended_at          : chr  "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
##  $ rideable_type     : chr  "2167" "4386" "1524" "252" ...
##  $ start_station_id  : int  199 44 15 123 173 98 98 211 150 268 ...
##  $ start_station_name: chr  "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
##  $ end_station_id    : int  84 624 644 176 35 49 49 142 148 141 ...
##  $ end_station_name  : chr  "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
##  $ member_casual     : chr  "member" "member" "member" "member" ...
##  $ date              : Date, format: "2019-01-01" "2019-01-01" ...
##  $ month             : chr  "01" "01" "01" "01" ...
##  $ day               : chr  "01" "01" "01" "01" ...
##  $ year              : chr  "2019" "2019" "2019" "2019" ...
##  $ day_of_week       : chr  "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...
##  $ ride_length       : num  390 441 829 1783 364 ...

3.2.4. mengeluarkan bad data.

mengecek data ride_lenght dan start_station_name yang dalam pemeriksaan sepintas sebelumnya mengandung potensi bad data seperti ride_lenght bernilai negatif atau start_station_name HQ QR (Head Quarter for Quality Control Reason) yang perlu dikeluarkan sebab merupakan sepeda yang diambil untuk perbaikan.

table(alltrips$ride_length, alltrips$start_station_name)

pada pengecekan menggunakan fungsi table tampak bahwa ada nilai ride_length yang negatif dan terdapat 3767 sepeda yang berangkat dari HQ QR. Data-data ini perlu dikeluarkan dan data yang telah bersih disimpan dalam data frame baru bernama alltrips_2

alltrips_2 <- alltrips[!(alltrips$start_station_name == "HQ QR" | alltrips$ride_length<0),]

Step 4: Analisis Deskriptif.

4.1. menghitung rata-rata, median, max, dan min dari lama waktu bersepeda (ride_length) semua users.

summary(alltrips_2$ride_length)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        1      331      539     1190      912 10632022

4.2. Membandingkan rata-rata, median, max, dan min dua kelompok users.

aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = mean)
##   alltrips_2$member_casual alltrips_2$ride_length
## 1                   casual              5372.7839
## 2                   member               795.2523
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = median)
##   alltrips_2$member_casual alltrips_2$ride_length
## 1                   casual                   1393
## 2                   member                    508
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = max)
##   alltrips_2$member_casual alltrips_2$ride_length
## 1                   casual               10632022
## 2                   member                6096428
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = min)
##   alltrips_2$member_casual alltrips_2$ride_length
## 1                   casual                      2
## 2                   member                      1

4.3 menghitung rata-rata waktu bersepada dua kelompok users berdasarkan hari.

aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual + alltrips_2$day_of_week, FUN = mean)
##    alltrips_2$member_casual alltrips_2$day_of_week alltrips_2$ride_length
## 1                    casual                 Friday              6090.7373
## 2                    member                 Friday               796.7338
## 3                    casual                 Monday              4752.0504
## 4                    member                 Monday               822.3112
## 5                    casual               Saturday              4950.7708
## 6                    member               Saturday               974.0730
## 7                    casual                 Sunday              5061.3044
## 8                    member                 Sunday               972.9383
## 9                    casual               Thursday              8451.6669
## 10                   member               Thursday               707.2093
## 11                   casual                Tuesday              4561.8039
## 12                   member                Tuesday               769.4416
## 13                   casual              Wednesday              4480.3724
## 14                   member              Wednesday               711.9838

memperbaiki urutan hari

alltrips_2$day_of_week <- ordered(alltrips_2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

menghitung ulang rata-rata waktu bersepeda dua kelompok users berdasarkan hari.

aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual + alltrips_2$day_of_week, FUN = mean)
##    alltrips_2$member_casual alltrips_2$day_of_week alltrips_2$ride_length
## 1                    casual                 Sunday              5061.3044
## 2                    member                 Sunday               972.9383
## 3                    casual                 Monday              4752.0504
## 4                    member                 Monday               822.3112
## 5                    casual                Tuesday              4561.8039
## 6                    member                Tuesday               769.4416
## 7                    casual              Wednesday              4480.3724
## 8                    member              Wednesday               711.9838
## 9                    casual               Thursday              8451.6669
## 10                   member               Thursday               707.2093
## 11                   casual                 Friday              6090.7373
## 12                   member                 Friday               796.7338
## 13                   casual               Saturday              4950.7708
## 14                   member               Saturday               974.0730

4.4 menganalisis data rideship berdasarkan tipe user dan weekday.

alltrips_2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>%  #creates weekday field using wday()
  group_by(member_casual, weekday) %>%  #groups by usertype and weekday
  summarise(number_of_rides = n() #calculates the number of rides and average duration 
            ,average_duration = mean(ride_length)) %>%      # calculates the average duration
  arrange(member_casual, weekday)   # sorts
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups:   member_casual [2]
##    member_casual weekday number_of_rides average_duration
##    <chr>         <ord>             <int>            <dbl>
##  1 casual        Sun               18652            5061.
##  2 casual        Mon                5591            4752.
##  3 casual        Tue                7311            4562.
##  4 casual        Wed                7690            4480.
##  5 casual        Thu                7147            8452.
##  6 casual        Fri                8013            6091.
##  7 casual        Sat               13473            4951.
##  8 member        Sun               60197             973.
##  9 member        Mon              110430             822.
## 10 member        Tue              127974             769.
## 11 member        Wed              121902             712.
## 12 member        Thu              125228             707.
## 13 member        Fri              115168             797.
## 14 member        Sat               59413             974.

4.5 Visualisasi

4.5.1. Visualisasi jumlah pengedasar berdasarkan tipe user dan hari.

alltrips_2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

4.5.2. Visualisasi rata-rata durasi bersepeda berdasarkan tipe user dan hari.

alltrips_2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

table(alltrips_2$member_casual)
## 
## casual member 
##  67877 720312

5. Analisis dan Rekomendasi

Berdasarkan analisis pada bagian 4 diperoleh sejumlah informasi:

  1. Number of rides terbanyak berasal dari user berstatus member, yaitu 720,312 dan merupakan 91.39 persen dari total number of rides.

  2. User berstatus member paling banyak bersepada pada hari Senin hingga Jumat. Number of rides members pada lima hari kerja ini mencapai sekitar 2 kali lipat number of rides member pada hari Sabtu dan Minggu. Jumlah number of rides members pada Sabtu dan Minggu hanya 8,25% dan 8,35% dari total number of rides. Sementara pada hari-hari lain berkisar antara 15,33% (Jumat) hingga 17.77% (Selasa).

  3. Sebaliknya number of rides user Casual hanya sebesar 67,877 (8,61 persen dari total number of rides). Casual users paling banyak bersepeda pada hari Sabtu dan Minggu. Number of rides user Casual pada hari Sabtu dan Minggu mencapai 1,88 kali dan 2,6 kali lebih banyak dibandingkan rata-rata number of rides kelompok user ini pada 5 hari lainnya.

  4. Meski berjumlah jauh lebih kecil, dari sisi durasi bersepeda, Casual user rata-rata bersepeda selama 5372.78 detik atau 89,55 menit; 6,75 kali lebih lama dari rata-rata durasi bersepeda members yang hanya 795,25 detik atau 13,25 menit.

Rekomendasi: Untuk meningkatkan peluang konversi casual users menjadi member, perlu dilakukan: 1. menaikan tarif single-ride passes dan full-day passes terutama untuk 30 menit kedua. Hal ini merupakan disinsentif terhadap status Casual. Untuk mendapatkan harga lebih murah, user akan mendaftarkan membership. 2. memberikan door prize bagi member yang bersepeda pada hari Sabtu dan Minggu. Dua hari ini adalah hari di mana paling banyak Casual Users bersepeda. Door prize khusus bagi member merupakan insentif bagi Casual users untuk mendaftar membership. 3. Mengintensifkan penyebarluasan materi promosi pada hari Sabtu dan Minggu.

6. Export file ringkasan untuk analisis lanjut.

counts <- aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual + alltrips_2$day_of_week, FUN = mean)
write.csv(counts, file = 'avg_ride_length.csv')