class: title-slide-custom .center-title[ # You’re a Parselmouth, Harry! ## Speaking python inside of R workflows ### Alec Wong • satRdays Columbus • 2021-10-02 ] ??? # Welcome # Going to go over the magic of {reticulate} --- > Any sufficiently advanced technology is indistinguishable from magic. -Arthur C. Clark ??? # For these 15 minutes, `{reticulate}` will be indistinguishable from magic. (Arthur C. Clarke). # We enter the wizarding world of R programming. --- .center2[![](https://i.redd.it/vnoncsyyw1ly.png)] ??? # Wife and I are reading through Harry Potter # The theme is appropriate, if unoriginal. # I'll take it to imply that if Harry is speaking python, Gryffindor speaks R by default. --- # Reticulate * Rstudio develops the package! * Use a REPL (python console) directly in Rstudio! * Convert python objects into R objects, and vice-versa! * View python environment right from your environments pane! * Can stick with R syntax! ```r install.packages("reticulate") ``` ??? # Package allows you to run python code, from your R session. --- # Reticulate - the basics .center2[ ![](https://c.tenor.com/X1IRBLDO5PEAAAAC/harry-potter-snake.gif) ] ??? # For those of you who haven't used python before... use what you know! # Continue to use Rstudio. --- # Setting up ```r library(reticulate) # what environments are present? envs <- conda_list() # Create an environment, latest python, with a compatible pandas package. conda_create('reticulate-satrday', packages = c('python=3.9', 'pandas=1.3.3')) # use the env we just created: use_condaenv('reticulate-satrday', required = TRUE) ``` Environments are a way to minimize package conflicts between your projects. I can start with a clean installation of python, and any packages I require. ??? # One way to get set up # Conda is a package management system for python # --- # Using python directly We can start a REPL (read-evaluate-print-loop) to do python things directly in a python console, just like in the R console. ```r repl_python() ``` ``` Python 3.9.7 (C:/Users/AlecW/miniconda3/envs/reticulate-satrday/python.exe) Reticulate 1.20 REPL -- A Python interpreter in R. ``` ``` exit ``` ??? # DEMO --- # Using python in R Markdown .pull-left[ I can run python directly, in this code chunk! ```python # This is native python code, run in R Markdown! import platform platform.platform() ``` ``` 'Windows-10-10.0.19043-SP0' ``` ] .pull-right[ I can also use python modules directly from R: ```r # # this is R! platform <- import('platform') platform$platform() ``` ``` [1] "Windows-10-10.0.19043-SP0" ``` * `$` used here is the equivalent to the method accessor `.` in python. ] ??? # Go through comparison # Python and then R # Then show the R Markdown --- ## Accessing python stuff from R ```r # R py_run_string('import numpy as np') py_run_string('vec = np.arange(1,10).tolist()') # Access python objects from R! py$vec ``` ``` [1] 1 2 3 4 5 6 7 8 9 ``` Compare to the same object, generated in R: ```r # R rvec <- seq(1,9) all.equal(rvec, py$vec) ``` ``` [1] TRUE ``` ??? # Reticulate provides py object # Converts on the fly --- ## Accessing R stuff from python Now we'll load the `CO2` dataset, which is a `data.frame` built into R. ```r # R data(CO2) ``` I can use the `r` module to get items from R, the same way I used the `py` object in R to get other stuff from the active python session. .pull-left[ **This is python** ```python # Use the R object in python df_in_python = r.CO2 df_in_python.head() ``` ``` Plant Type Treatment conc uptake 0 Qn1 Quebec nonchilled 95.0 16.0 1 Qn1 Quebec nonchilled 175.0 30.4 2 Qn1 Quebec nonchilled 250.0 34.8 3 Qn1 Quebec nonchilled 350.0 37.2 4 Qn1 Quebec nonchilled 500.0 35.3 ``` ] .pull-right[ **This is R** ```r # Use the python object in R # head(py$df_in_python) ``` ``` Plant Type Treatment conc uptake 1 Qn1 Quebec nonchilled 95 16.0 2 Qn1 Quebec nonchilled 175 30.4 3 Qn1 Quebec nonchilled 250 34.8 4 Qn1 Quebec nonchilled 350 37.2 5 Qn1 Quebec nonchilled 500 35.3 6 Qn1 Quebec nonchilled 675 39.2 ``` ] ??? # Go the other way now # In python there's an r module, to get stuff from R, for use in python # Note pandas conversion # Note types are maintained, ordered factors --- .center-div[ ![](https://i.pinimg.com/originals/d5/1c/d2/d51cd27e27aaca79c3066dfa80a5b42b.gif) Using `reticulate` with `dplyr`, sometimes you won't notice you're using python. ] ??? # ENHANCE # Use python with dplyr --- # Life example ## ETL routine @ work * Python module written to interface with MS OLAP cubes. `pyadomd` * Bunch of functions written to handle the query and transform data. * Used R's `DBI` package to upload to SQL Server (most convenient API!). --- # Code setup, imports ```r # Set up conda reticulate::use_condaenv("foresight_etl", required = TRUE) # Python modules foresight_module <- reticulate::import("Foresight") # A package I wrote # Python functions foresight <- foresight_module$query$Foresight() # Class, with methods in it add_year_month_cols <- foresight_module$ffc2$add_year_month_cols # Function convert_state_code <- foresight_module$utils$convert_to_state_code # Function match_wm_line_cov <- foresight_module$utils$match_wm_line_cov # Function ``` * Using the import function and `$` accessor to get the functions out of my module. ??? # Going through the process again, but with worked example # Keep in mind, no R here! --- # Workflow ```r # Data pull - execute method from the 'foresight' python class ff_orig <- foresight$query_feature_forecast(archive_dates = list("CUR FORECAST")) # Transformations ff <- ff_orig %>% * add_year_month_cols() %>% #PY mutate( * StateCode = convert_state_code(PLN_ST_CD), #PY * WM_LINE_COV = match_wm_line_cov(LineCoverage), #PY ..., ) dbWriteTable(con, table_name, ff, overwrite = TRUE) ``` * The functions written in python "just work" when applied to a dplyr pipe workflow. --- # What makes these functions work? For use *outside* of a dplyr verb ... ```python *def add_year_month_cols(data: pd.DataFrame): * _data = data.copy(deep=True) _data['Month'], _data['Year'] = separate_year_month(_data.Acct_Dt_Lvl_2) _data['Month'] = convert_month_abbr(_data.Month) _data['YearMonth'] = _data.Year.astype(str) + _data.Month * return _data ``` Try to operate as R would. * Takes a data frame * Performs a deep copy of the data frame * (R does not alter the data frame from parent environment, only operates on a copy of the df within function environment) * Returns a modified copy of the data frame --- # What makes these functions work? For use *inside* of a dplyr verb ... ```python *def match_wm_line_cov(line_coverage_list): line_cov_match = { "BI": "INJURY", "COLL": "PD", "COMP": "PD", "MP": "MEDPAY", "OTHER": "OTHER", "PD": "PD", "PIP": "PIP", "UMBI": "INJURY", "UMPD": "PD" } * wm_lc = [line_cov_match[x] for x in line_coverage_list] return(wm_lc) ``` * Takes a vector (a list, in python parlance) * Returns a vector * `reticulate` handles the transformations automatically! --- .center2[ ![](https://scontent-iad3-2.xx.fbcdn.net/v/t1.18169-9/404164_452171128147964_431862419_n.jpg?_nc_cat=111&ccb=1-5&_nc_sid=09cbfe&_nc_ohc=HwTpGWhswXAAX-h4M1y&_nc_ht=scontent-iad3-2.xx&oh=786a13442e7643aae66cd4fd01d63fc6&oe=61735766) ] ??? # As Gilderoy Lockhart says, AMAZING # I was shocked when I first ran this, expected error # Rstudio nothing short of wizards --- # Don't forget! <img src='assets/obliviate.gif' height=300px width=650px> ??? # For those that know, this is especially hilarious because the spell "obliviate" is used, and backfires --- # Don't forget! <img src=https://img.wattpad.com/30a7ba68e7c6cd6155169f335e33d434fcb0f7e5/68747470733a2f2f73332e616d617a6f6e6177732e636f6d2f776174747061642d6d656469612d736572766963652f53746f7279496d6167652f79347a30417a4646724c66494f773d3d2d3235382e313635346666356630633265633233643736303331363237323838302e676966 height=300px width=650px> ??? # And the dude promptly forgets everything --- # Don't forget! ### Caveats and considerations * Make sure to use `required=TRUE` when calling `use_condaenv`! * It's a good practice to export your environment to a text file. * `reticulate` doesn't have any functions yet to export/import an environment from file. * You'll have to use the command line to do this. ```shell # Export the current environment (all packages & versions saved to computer-readable text file) conda activate {environment name} conda env export > environment.yml # Install the environment from the file definition conda env create -f environment.yml ``` * Not all python objects are transferable to R, and vice-versa. [Here's a list.](https://rstudio.github.io/reticulate/#type-conversions) --- .flash[ # 500 points to Gryffindor! ] .center2[ <img src='https://media4.giphy.com/media/JX8dov1ZPeKTm/giphy.gif?cid=790b761149cb04cfe43b6ec5c3bc73925bb0272f07dafde2&rid=giphy.gif&ct=g' style='height:500px;'> ] --- # Thank you! ## Resources * <img src='https://static.wikia.nocookie.net/naruto/images/5/56/Sharingan_Triple.svg/revision/latest/scale-to-width-down/300?cb=20091022225716' width=20px height=20px> `xaringan` for slide creation • Yihui Xie * <img src='https://rstudio.github.io/reticulate/images/reticulated_python.png' width=20px height=20px> `reticulate` • Kevin Ushey, J.J. Allaire * <img src='https://raw.githubusercontent.com/tidyverse/dplyr/master/man/figures/logo.png' width=20px height=20px> `dplyr` • Hadley Wickham, Romain François, Lionel Henry, Kirill Müller * <img src='https://docs.conda.io/en/latest/_images/conda_logo.svg' width=20px height=20px> `miniconda` • Anaconda, Inc. <hr> *Find the presentation source at:* [https://github.com/awong234/reticulate_pres](https://github.com/awong234/reticulate_pres) [Slideshow itself](https://awong234.github.io/reticulate_pres)