Bike Users Analysis
George
2025-08-31
Latar Belakang
Cyclistic adalah perusahaan (fictional) penyewaan sepeda di Chicago, Amerika Serikat. Cyclistic mengoperasikan 5,824 sepeda di 692 stasiun di Chicago. Pengguna dapat memilih 3 opsi pembayaran: single-ride passes, full-day passes, dan annual memberships. Pengguna single-ride passes atau full-day passes disebut casual users, sementara yang membayar iuran keanggotaan untuk setahun disebut member. Bagian Pemasaran hendak meningkatkan pendapatan perusahaan melalui strategi konversi casual user menjadi member. Untuk itu diperlukan analisis data perilaku users –perbedaan perilaku members dan casual users– dalam penggunaan layanan yang disediakan Cyclistic.
Dengan memahami perilaku users, perusahaan dapat memutuskan strategi pemasaran yang tepat untuk mengonversi casual users menjadi member.
Step 1: Persiapan Lingkungan kerja dan Mengunggah data ke R
Untuk menjawab pertanyaan di atas, data yang digunakan dalam kasus fictional ini adalah data yang dimiliki City of Chicago’s (“City”) Divvy bicycle sharing service. Data yang tersedia adalah 2 set data penggunaan sepeda, yaitu 1Q2019 dan 1Q2020.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.2 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
Membaca data csv menggunakan read.csv dan menyimpan dalam data frame baru dengan nama trips_2019_q1 dan trips_2020_q1.
trips_2019_q1 <- read.csv("Divvy_Trips_2019_Q1.csv")
trips_2020_q1 <- read.csv("Divvy_Trips_2020_Q1.csv")
Step 2: Menggabungkan dua data frame menjadi satu data frame baru.
2.1. membandingkan variable dua data frame
colnames(trips_2019_q1)
## [1] "trip_id" "start_time" "end_time"
## [4] "bikeid" "tripduration" "from_station_id"
## [7] "from_station_name" "to_station_id" "to_station_name"
## [10] "usertype" "gender" "birthyear"
colnames(trips_2020_q1)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
str(trips_2019_q1)
## 'data.frame': 365069 obs. of 12 variables:
## $ trip_id : int 21742443 21742444 21742445 21742446 21742447 21742448 21742449 21742450 21742451 21742452 ...
## $ start_time : chr "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
## $ end_time : chr "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
## $ bikeid : int 2167 4386 1524 252 1170 2437 2708 2796 6205 3939 ...
## $ tripduration : chr "390" "441" "829" "1,783.00" ...
## $ from_station_id : int 199 44 15 123 173 98 98 211 150 268 ...
## $ from_station_name: chr "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ to_station_id : int 84 624 644 176 35 49 49 142 148 141 ...
## $ to_station_name : chr "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ usertype : chr "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
## $ gender : chr "Male" "Female" "Female" "Male" ...
## $ birthyear : int 1989 1990 1994 1993 1994 1983 1984 1990 1995 1996 ...
str(trips_2020_q1)
## 'data.frame': 426887 obs. of 13 variables:
## $ ride_id : chr "EACB19130B0CDA4A" "8FED874C809DC021" "789F3C21E472CA96" "C9A388DAC6ABF313" ...
## $ rideable_type : chr "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
## $ started_at : chr "2020-01-21 20:06:59" "2020-01-30 14:22:39" "2020-01-09 19:29:26" "2020-01-06 16:17:07" ...
## $ ended_at : chr "2020-01-21 20:14:30" "2020-01-30 14:26:22" "2020-01-09 19:32:17" "2020-01-06 16:25:56" ...
## $ start_station_name: chr "Western Ave & Leland Ave" "Clark St & Montrose Ave" "Broadway & Belmont Ave" "Clark St & Randolph St" ...
## $ start_station_id : int 239 234 296 51 66 212 96 96 212 38 ...
## $ end_station_name : chr "Clark St & Leland Ave" "Southport Ave & Irving Park Rd" "Wilton Ave & Belmont Ave" "Fairbanks Ct & Grand Ave" ...
## $ end_station_id : int 326 318 117 24 212 96 212 212 96 100 ...
## $ start_lat : num 42 42 41.9 41.9 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.6 -87.6 -87.6 ...
## $ end_lat : num 42 42 41.9 41.9 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr "member" "member" "member" "member" ...
2.2. menyeragamkan nama variable; mengubah nama variable pada data frame trips_2019_q1 agar sama dengan trips_2020_q1)
(trips_2019_q1 <- rename(trips_2019_q1,
ride_id = trip_id,
rideable_type = bikeid,
started_at = start_time,
ended_at = end_time,
start_station_id = from_station_id,
start_station_name = from_station_name,
end_station_name = to_station_name,
end_station_id = to_station_id,
member_casual = usertype))
2.3. mengecek kembali hasil rename menggunakan str
str(trips_2019_q1)
## 'data.frame': 365069 obs. of 12 variables:
## $ ride_id : int 21742443 21742444 21742445 21742446 21742447 21742448 21742449 21742450 21742451 21742452 ...
## $ started_at : chr "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
## $ ended_at : chr "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
## $ rideable_type : int 2167 4386 1524 252 1170 2437 2708 2796 6205 3939 ...
## $ tripduration : chr "390" "441" "829" "1,783.00" ...
## $ start_station_id : int 199 44 15 123 173 98 98 211 150 268 ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ end_station_id : int 84 624 644 176 35 49 49 142 148 141 ...
## $ end_station_name : chr "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ member_casual : chr "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
## $ gender : chr "Male" "Female" "Female" "Male" ...
## $ birthyear : int 1989 1990 1994 1993 1994 1983 1984 1990 1995 1996 ...
str(trips_2020_q1)
## 'data.frame': 426887 obs. of 13 variables:
## $ ride_id : chr "EACB19130B0CDA4A" "8FED874C809DC021" "789F3C21E472CA96" "C9A388DAC6ABF313" ...
## $ rideable_type : chr "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
## $ started_at : chr "2020-01-21 20:06:59" "2020-01-30 14:22:39" "2020-01-09 19:29:26" "2020-01-06 16:17:07" ...
## $ ended_at : chr "2020-01-21 20:14:30" "2020-01-30 14:26:22" "2020-01-09 19:32:17" "2020-01-06 16:25:56" ...
## $ start_station_name: chr "Western Ave & Leland Ave" "Clark St & Montrose Ave" "Broadway & Belmont Ave" "Clark St & Randolph St" ...
## $ start_station_id : int 239 234 296 51 66 212 96 96 212 38 ...
## $ end_station_name : chr "Clark St & Leland Ave" "Southport Ave & Irving Park Rd" "Wilton Ave & Belmont Ave" "Fairbanks Ct & Grand Ave" ...
## $ end_station_id : int 326 318 117 24 212 96 212 212 96 100 ...
## $ start_lat : num 42 42 41.9 41.9 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.6 -87.6 -87.6 ...
## $ end_lat : num 42 42 41.9 41.9 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.6 -87.6 ...
## $ member_casual : chr "member" "member" "member" "member" ...
Tampak bahwa terdapat perbedaan format antara trips_2019_q1 dan trips_2020_q1, berupa: 1. perbedaan format pada kolom ride_id dan rideable_type di mana pada data type pada trips_2019_q1 berupa integer sementara pada trips_2020_q1 berupa karakter. 2. Perbedaan isi variable member_casual di mana trips_2019_q1 menggunakan “member” and “Subscriber”, sementara trips_2020_q1 menggunakan “Customer” dan “Casual”.
Persoalan pertama (perbedaan format) diperbaiki sebelum penggabungan dua dataframe. Sementara perbedaan penulisan tipe keanggotaan diperbaiki setelah penggabungan data.
2.4. Penyeragaman Format Data: ubah format data pada trips_2019_q1 menjadi character
trips_2019_q1 <- mutate(trips_2019_q1, ride_id = as.character(ride_id),
rideable_type = as.character(rideable_type))
2.5. Pengabungan (stack) data frame menjadi data frame baru “alltrips” menggunakan fungsi bind_rows
alltrips <- bind_rows(trips_2019_q1, trips_2020_q1)
2.6. menghapus variable yang hanya terdapat di dalam satu date frame.
Variable start_lat, start_lng, end_lat, dan end_lng hanya terdapat di dataframe trips_2020_q1; sementara variable tripduration, gender, dan birthyear hanya terdapat dalam dataframe tirps_2019_q1.
alltrips <- alltrips %>%
select(-c(start_lat, start_lng, end_lat,end_lng, tripduration, gender, birthyear ))
Step 3: Data Cleaning dan Persiapan Analisis
3.1. memeriksa data frame hasil merger (alltrips)
daftar kolom
colnames(alltrips)
## [1] "ride_id" "started_at" "ended_at"
## [4] "rideable_type" "start_station_id" "start_station_name"
## [7] "end_station_id" "end_station_name" "member_casual"
Jumlah rows atau observasi
nrow(alltrips)
## [1] 791956
Dimensi data frame
dim(alltrips)
## [1] 791956 9
Memeriksa 6 baris awal data frame
head(alltrips)
## ride_id started_at ended_at rideable_type start_station_id
## 1 21742443 2019-01-01 0:04:37 2019-01-01 0:11:07 2167 199
## 2 21742444 2019-01-01 0:08:13 2019-01-01 0:15:34 4386 44
## 3 21742445 2019-01-01 0:13:23 2019-01-01 0:27:12 1524 15
## 4 21742446 2019-01-01 0:13:45 2019-01-01 0:43:28 252 123
## 5 21742447 2019-01-01 0:14:52 2019-01-01 0:20:56 1170 173
## 6 21742448 2019-01-01 0:15:33 2019-01-01 0:19:09 2437 98
## start_station_name end_station_id
## 1 Wabash Ave & Grand Ave 84
## 2 State St & Randolph St 624
## 3 Racine Ave & 18th St 644
## 4 California Ave & Milwaukee Ave 176
## 5 Mies van der Rohe Way & Chicago Ave 35
## 6 LaSalle St & Washington St 49
## end_station_name member_casual
## 1 Milwaukee Ave & Grand Ave Subscriber
## 2 Dearborn St & Van Buren St (*) Subscriber
## 3 Western Ave & Fillmore St (*) Subscriber
## 4 Clark St & Elm St Subscriber
## 5 Streeter Dr & Grand Ave Subscriber
## 6 Dearborn St & Monroe St Subscriber
tail(alltrips)
## ride_id started_at ended_at rideable_type
## 791951 6F4D221BDDFD943F 2020-03-10 10:40:27 2020-03-10 10:40:29 docked_bike
## 791952 ADDAA33CEBCAE733 2020-03-10 10:40:06 2020-03-10 10:40:07 docked_bike
## 791953 82B10FA3994BC66A 2020-03-07 15:25:55 2020-03-07 16:14:03 docked_bike
## 791954 AA0D5AAA0B59C8AA 2020-03-01 13:12:38 2020-03-01 13:38:29 docked_bike
## 791955 3296360A7BC20FB8 2020-03-07 18:02:45 2020-03-07 18:13:18 docked_bike
## 791956 064EC7698E4FF9B3 2020-03-08 13:03:57 2020-03-08 13:32:27 docked_bike
## start_station_id start_station_name end_station_id
## 791951 675 HQ QR 675
## 791952 675 HQ QR 675
## 791953 161 Rush St & Superior St 240
## 791954 141 Clark St & Lincoln Ave 210
## 791955 672 Franklin St & Illinois St 264
## 791956 110 Dearborn St & Erie St 85
## end_station_name member_casual
## 791951 HQ QR casual
## 791952 HQ QR casual
## 791953 Sheridan Rd & Irving Park Rd member
## 791954 Ashland Ave & Division St casual
## 791955 Stetson Ave & South Water St member
## 791956 Michigan Ave & Oak St casual
daftar kolom disertai tipe data
str(alltrips)
## 'data.frame': 791956 obs. of 9 variables:
## $ ride_id : chr "21742443" "21742444" "21742445" "21742446" ...
## $ started_at : chr "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
## $ ended_at : chr "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
## $ rideable_type : chr "2167" "4386" "1524" "252" ...
## $ start_station_id : int 199 44 15 123 173 98 98 211 150 268 ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ end_station_id : int 84 624 644 176 35 49 49 142 148 141 ...
## $ end_station_name : chr "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ member_casual : chr "Subscriber" "Subscriber" "Subscriber" "Subscriber" ...
Ringkasan statistik data
summary(alltrips)
## ride_id started_at ended_at rideable_type
## Length:791956 Length:791956 Length:791956 Length:791956
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_station_id start_station_name end_station_id end_station_name
## Min. : 2.0 Length:791956 Min. : 2.0 Length:791956
## 1st Qu.: 77.0 Class :character 1st Qu.: 77.0 Class :character
## Median :174.0 Mode :character Median :174.0 Mode :character
## Mean :204.4 Mean :204.4
## 3rd Qu.:291.0 3rd Qu.:291.0
## Max. :675.0 Max. :675.0
## NA's :1
## member_casual
## Length:791956
## Class :character
## Mode :character
##
##
##
##
3.2. Data Cleaning
3.2.1. Menyeragamkan kategori user menjadi member dan casual.
Pada dataframe terdapat 4 kategori: member/Customer dan Casual/Subscriber.
table(alltrips$member_casual)
##
## casual Customer member Subscriber
## 48480 23163 378407 341906
mengubah Customer menjadi member dan Subscriber menjadi Casual.
alltrips <- alltrips %>%
mutate(member_casual = recode(member_casual
,"Subscriber" = "member"
,"Customer" = "casual"))
mengecek hasilnya
table(alltrips$member_casual)
##
## casual member
## 71643 720313
3.2.2. menambahkan kolom agar mendapatkan data jumlah penggunaan sepeda per hari dalam sepekan.
Data jumlah penggunaan sepeda per hari yang berbeda-beda tidak dapat dihasilkan dari data yang ada. Untuk itu kita perlu menambahkan sejumlah kolom: tahun, bulan, tanggal, dan hari yang diperoleh dari data “started_at”
menambahkan kolom tanggal tanpa satuan waktu
alltrips$date <- as.Date(alltrips$started_at)
menambahkan kolom month yang diturunkan dari kolom date
alltrips$month <- format(as.Date(alltrips$date), "%m")
menambahkan kolom day yang diturunkan dari kolom date
alltrips$day <- format(as.Date(alltrips$date), "%d")
menambahkan kolom year yang diturunkan dari kolom date
alltrips$year <- format(as.Date(alltrips$date), "%Y")
menambahkan kolom day_of_week yang diturunkan dari kolom date
alltrips$day_of_week <- format(as.Date(alltrips$date), "%A")
memeriksa hasilnya
str(alltrips)
## 'data.frame': 791956 obs. of 14 variables:
## $ ride_id : chr "21742443" "21742444" "21742445" "21742446" ...
## $ started_at : chr "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
## $ ended_at : chr "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
## $ rideable_type : chr "2167" "4386" "1524" "252" ...
## $ start_station_id : int 199 44 15 123 173 98 98 211 150 268 ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ end_station_id : int 84 624 644 176 35 49 49 142 148 141 ...
## $ end_station_name : chr "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ member_casual : chr "member" "member" "member" "member" ...
## $ date : Date, format: "2019-01-01" "2019-01-01" ...
## $ month : chr "01" "01" "01" "01" ...
## $ day : chr "01" "01" "01" "01" ...
## $ year : chr "2019" "2019" "2019" "2019" ...
## $ day_of_week : chr "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...
3.2.3. menghitung waktu tempuh
Untuk menghitung waktu tempuh, dibutuhkan kolom yang merupakan hasil perhitungan selisih antara waktu berangkat (“started_at”) dan waktu tiba (“ended_at”) menggunakan fungsi difftime.
alltrips$ride_length <- difftime(alltrips$ended_at,alltrips$started_at)
mengecek hasilnya
str(alltrips)
## 'data.frame': 791956 obs. of 15 variables:
## $ ride_id : chr "21742443" "21742444" "21742445" "21742446" ...
## $ started_at : chr "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
## $ ended_at : chr "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
## $ rideable_type : chr "2167" "4386" "1524" "252" ...
## $ start_station_id : int 199 44 15 123 173 98 98 211 150 268 ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ end_station_id : int 84 624 644 176 35 49 49 142 148 141 ...
## $ end_station_name : chr "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ member_casual : chr "member" "member" "member" "member" ...
## $ date : Date, format: "2019-01-01" "2019-01-01" ...
## $ month : chr "01" "01" "01" "01" ...
## $ day : chr "01" "01" "01" "01" ...
## $ year : chr "2019" "2019" "2019" "2019" ...
## $ day_of_week : chr "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...
## $ ride_length : 'difftime' num 390 441 829 1783 ...
## ..- attr(*, "units")= chr "secs"
mengubah tipe data ride_length dari factor ke numerical
is.factor(alltrips$ride_length)
## [1] FALSE
alltrips$ride_length <- as.numeric(as.character(alltrips$ride_length))
is.numeric(alltrips$ride_length)
## [1] TRUE
mengecek hasilnya
str(alltrips)
## 'data.frame': 791956 obs. of 15 variables:
## $ ride_id : chr "21742443" "21742444" "21742445" "21742446" ...
## $ started_at : chr "2019-01-01 0:04:37" "2019-01-01 0:08:13" "2019-01-01 0:13:23" "2019-01-01 0:13:45" ...
## $ ended_at : chr "2019-01-01 0:11:07" "2019-01-01 0:15:34" "2019-01-01 0:27:12" "2019-01-01 0:43:28" ...
## $ rideable_type : chr "2167" "4386" "1524" "252" ...
## $ start_station_id : int 199 44 15 123 173 98 98 211 150 268 ...
## $ start_station_name: chr "Wabash Ave & Grand Ave" "State St & Randolph St" "Racine Ave & 18th St" "California Ave & Milwaukee Ave" ...
## $ end_station_id : int 84 624 644 176 35 49 49 142 148 141 ...
## $ end_station_name : chr "Milwaukee Ave & Grand Ave" "Dearborn St & Van Buren St (*)" "Western Ave & Fillmore St (*)" "Clark St & Elm St" ...
## $ member_casual : chr "member" "member" "member" "member" ...
## $ date : Date, format: "2019-01-01" "2019-01-01" ...
## $ month : chr "01" "01" "01" "01" ...
## $ day : chr "01" "01" "01" "01" ...
## $ year : chr "2019" "2019" "2019" "2019" ...
## $ day_of_week : chr "Tuesday" "Tuesday" "Tuesday" "Tuesday" ...
## $ ride_length : num 390 441 829 1783 364 ...
3.2.4. mengeluarkan bad data.
mengecek data ride_lenght dan start_station_name yang dalam pemeriksaan sepintas sebelumnya mengandung potensi bad data seperti ride_lenght bernilai negatif atau start_station_name HQ QR (Head Quarter for Quality Control Reason) yang perlu dikeluarkan sebab merupakan sepeda yang diambil untuk perbaikan.
table(alltrips$ride_length, alltrips$start_station_name)
pada pengecekan menggunakan fungsi table tampak bahwa ada nilai ride_length yang negatif dan terdapat 3767 sepeda yang berangkat dari HQ QR. Data-data ini perlu dikeluarkan dan data yang telah bersih disimpan dalam data frame baru bernama alltrips_2
alltrips_2 <- alltrips[!(alltrips$start_station_name == "HQ QR" | alltrips$ride_length<0),]
Step 4: Analisis Deskriptif.
4.1. menghitung rata-rata, median, max, dan min dari lama waktu bersepeda (ride_length) semua users.
summary(alltrips_2$ride_length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1 331 539 1190 912 10632022
4.2. Membandingkan rata-rata, median, max, dan min dua kelompok users.
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = mean)
## alltrips_2$member_casual alltrips_2$ride_length
## 1 casual 5372.7839
## 2 member 795.2523
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = median)
## alltrips_2$member_casual alltrips_2$ride_length
## 1 casual 1393
## 2 member 508
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = max)
## alltrips_2$member_casual alltrips_2$ride_length
## 1 casual 10632022
## 2 member 6096428
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual, FUN = min)
## alltrips_2$member_casual alltrips_2$ride_length
## 1 casual 2
## 2 member 1
4.3 menghitung rata-rata waktu bersepada dua kelompok users berdasarkan hari.
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual + alltrips_2$day_of_week, FUN = mean)
## alltrips_2$member_casual alltrips_2$day_of_week alltrips_2$ride_length
## 1 casual Friday 6090.7373
## 2 member Friday 796.7338
## 3 casual Monday 4752.0504
## 4 member Monday 822.3112
## 5 casual Saturday 4950.7708
## 6 member Saturday 974.0730
## 7 casual Sunday 5061.3044
## 8 member Sunday 972.9383
## 9 casual Thursday 8451.6669
## 10 member Thursday 707.2093
## 11 casual Tuesday 4561.8039
## 12 member Tuesday 769.4416
## 13 casual Wednesday 4480.3724
## 14 member Wednesday 711.9838
memperbaiki urutan hari
alltrips_2$day_of_week <- ordered(alltrips_2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
menghitung ulang rata-rata waktu bersepeda dua kelompok users berdasarkan hari.
aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual + alltrips_2$day_of_week, FUN = mean)
## alltrips_2$member_casual alltrips_2$day_of_week alltrips_2$ride_length
## 1 casual Sunday 5061.3044
## 2 member Sunday 972.9383
## 3 casual Monday 4752.0504
## 4 member Monday 822.3112
## 5 casual Tuesday 4561.8039
## 6 member Tuesday 769.4416
## 7 casual Wednesday 4480.3724
## 8 member Wednesday 711.9838
## 9 casual Thursday 8451.6669
## 10 member Thursday 707.2093
## 11 casual Friday 6090.7373
## 12 member Friday 796.7338
## 13 casual Saturday 4950.7708
## 14 member Saturday 974.0730
4.4 menganalisis data rideship berdasarkan tipe user dan weekday.
alltrips_2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>% #creates weekday field using wday()
group_by(member_casual, weekday) %>% #groups by usertype and weekday
summarise(number_of_rides = n() #calculates the number of rides and average duration
,average_duration = mean(ride_length)) %>% # calculates the average duration
arrange(member_casual, weekday) # sorts
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
## # A tibble: 14 × 4
## # Groups: member_casual [2]
## member_casual weekday number_of_rides average_duration
## <chr> <ord> <int> <dbl>
## 1 casual Sun 18652 5061.
## 2 casual Mon 5591 4752.
## 3 casual Tue 7311 4562.
## 4 casual Wed 7690 4480.
## 5 casual Thu 7147 8452.
## 6 casual Fri 8013 6091.
## 7 casual Sat 13473 4951.
## 8 member Sun 60197 973.
## 9 member Mon 110430 822.
## 10 member Tue 127974 769.
## 11 member Wed 121902 712.
## 12 member Thu 125228 707.
## 13 member Fri 115168 797.
## 14 member Sat 59413 974.
4.5 Visualisasi
4.5.1. Visualisasi jumlah pengedasar berdasarkan tipe user dan hari.
alltrips_2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
4.5.2. Visualisasi rata-rata durasi bersepeda berdasarkan tipe user dan hari.
alltrips_2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
geom_col(position = "dodge")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
table(alltrips_2$member_casual)
##
## casual member
## 67877 720312
5. Analisis dan Rekomendasi
Berdasarkan analisis pada bagian 4 diperoleh sejumlah informasi:
Number of rides terbanyak berasal dari user berstatus member, yaitu 720,312 dan merupakan 91.39 persen dari total number of rides.
User berstatus member paling banyak bersepada pada hari Senin hingga Jumat. Number of rides members pada lima hari kerja ini mencapai sekitar 2 kali lipat number of rides member pada hari Sabtu dan Minggu. Jumlah number of rides members pada Sabtu dan Minggu hanya 8,25% dan 8,35% dari total number of rides. Sementara pada hari-hari lain berkisar antara 15,33% (Jumat) hingga 17.77% (Selasa).
Sebaliknya number of rides user Casual hanya sebesar 67,877 (8,61 persen dari total number of rides). Casual users paling banyak bersepeda pada hari Sabtu dan Minggu. Number of rides user Casual pada hari Sabtu dan Minggu mencapai 1,88 kali dan 2,6 kali lebih banyak dibandingkan rata-rata number of rides kelompok user ini pada 5 hari lainnya.
Meski berjumlah jauh lebih kecil, dari sisi durasi bersepeda, Casual user rata-rata bersepeda selama 5372.78 detik atau 89,55 menit; 6,75 kali lebih lama dari rata-rata durasi bersepeda members yang hanya 795,25 detik atau 13,25 menit.
Rekomendasi: Untuk meningkatkan peluang konversi casual users menjadi member, perlu dilakukan: 1. menaikan tarif single-ride passes dan full-day passes terutama untuk 30 menit kedua. Hal ini merupakan disinsentif terhadap status Casual. Untuk mendapatkan harga lebih murah, user akan mendaftarkan membership. 2. memberikan door prize bagi member yang bersepeda pada hari Sabtu dan Minggu. Dua hari ini adalah hari di mana paling banyak Casual Users bersepeda. Door prize khusus bagi member merupakan insentif bagi Casual users untuk mendaftar membership. 3. Mengintensifkan penyebarluasan materi promosi pada hari Sabtu dan Minggu.
6. Export file ringkasan untuk analisis lanjut.
counts <- aggregate(alltrips_2$ride_length ~ alltrips_2$member_casual + alltrips_2$day_of_week, FUN = mean)
write.csv(counts, file = 'avg_ride_length.csv')
