Jupyter & Nteract Open Source Development

About

In this episode of Open Source Directions we were joined by Matthew Seal who talked about the work he has been doing with Jupyter and Nteract. Matthew also discussed a particular topic: common Jupyter tools and their adoption for various use cases in the wild.

Transcript

0:00
[Music] hello the internet welcome to open source directions
0:06
brought to you by open teams the business to business marketplace for all your open source support and
0:11
services i’m your host henry badgery and joining me as the co-host is
0:16
hi i’m tony fast i’m a data scientist and open source consultant at quantity
0:22
and a huge jupiter fanboy and i’m really excited to introduce uh
0:27
matthew today hi uh yeah i’m matthew seal my uh i’m been formally at uh netflix
0:36
and then before that open gov and most recently i started a new company called unnotable
0:41
which works in the the jupiter ecosystem uh to build sas solutions uh my kind of background has been around
0:47
uh uh doing work on big data teams and data infrastructure systems and helping with integration so people can
0:54
can more easily do their job awesome will yeah thank you thank you very much for joining us today we’re we’re very glad that you decided
1:00
to come on the episode uh with now this is going to be our little famous tweet of the week that week section
1:05
where each of our panelists presents a tweet uh that they’ve been enjoying recently so matthew you’re up
1:12
first uh yeah i’ve got um two of them i’m not sure how we’re actually sharing them on the
1:17
thing but uh um one of them was just the announcement that we were um hey when was the announcement that we
1:23
were making a um a new company and we just founded it and we’ve got funding and
1:29
we’re right in front of the races to help people with using jupiter and then um the second one which was a
1:34
little unrelated to me to be a little more what i’ve enjoyed in the world i thought the i copied the tweet
1:40
from spacex where they had the launch that successfully put the astronauts in the space that was so cool to me so i was
1:45
really happy to see that happen again in my lifetime i was getting worried
1:50
awesome tony do you have something to share uh i just uh signed out of all of my social media
1:57
in the past week and i’m spending a lot of time checking in on my peoples and i
2:02
hope that y’all are doing the same and kind of listening and making sure everybody’s staying safe and
2:09
healthy physically yeah yeah no i definitely agree that’s
2:14
that’s very a timely thing to say um my tweet of the week is i think it’s probably the
2:19
same tweet as yours i have a look maybe it’s a little bit different but it’s the first time yeah the nasa astronauts actually entered the space
2:24
station uh from a commercial commercially created aircraft and that to me it sort of got lost in the
2:31
noise of everything that’s happening right now but it’s just such a significant moment because this is going to be
2:37
a genuinely a new time for space exploration uh because now that we’ve had the government’s uh almost like a dark period where not a
2:44
lot has happened and now we’ve got the the commercial side of it and the economic interest there i think is
2:49
really going to drive something and change it so i just wanted to share that now we’re going to get on to
2:55
uh introduce the project and give you a little bit more information and context um so jupiter is an organization
3:03
and a collection of repositories and protocols uh they formulate around the notebook
3:09
offering uh interact is a branch of exploratory solutions and libraries with the larger
3:15
jupiter within the larger jupiter system uh and and some of the stats for the two popular packages
3:20
are from each so we’ve got around 10.2 10 200 stars uh for jupiter
3:27
on github and 3 300 stars for paper mills so as you can tell they’re very very
3:32
popular projects um on pie pie and conda we have nine
3:37
million two hundred around nine million two hundred thousand downloads for jupiter and around two hundred and eighteen
3:43
thousand wall paper mills so isn’t that incredible um and some of the large companies which i thought was very very interesting uh
3:49
they’re using and directly contributing to the software and those companies are netflix
3:55
bloomberg ibm microsoft amazon google which we all know very well and so there’s
4:01
some big names and so i can’t wait to get into this episode and learn more
4:06
yeah yeah it’s uh it’s exciting to uh to interview you on this mat um it’s
4:12
super cool to see how popular these interact and paper mill projects uh have become
4:18
so i was curious what was sort of the motivation beginning uh starting these projects and um you know
4:24
how long ago and what kind of need were you all trying to fill and is that mission still
4:29
still holding true yeah um and i think i’ll talk a little bit about what the different projects the story
4:35
arcs are here and where they came from a bit but um i think there’s also kind of two different
4:40
there’s a couple different points of need where some things inflect i uh to give a brief moment for you to
4:45
talk about the overall history for uh when i kind of gotten more involved in the jupiter’s ecosystem
4:51
space there was kind of some flexion point to make it useful for more cases with companies
4:56
um so there was kind of this like usage but there were some gaps and
5:02
there were some friction points that made it seem like oh you never use this for real code it was kind of what was thrown around
5:08
and we were able to build some tools to kind of turn that around and be like no you really can’t use real code you just need to
5:13
solve a few of the smaller problems and then and they can be very useful tool um so some some history in there uh
5:20
about the jupiter and interact so jupiter was an organization uh where the this original notebook code came from um
5:26
there’s other notebook offerings out there we’ll talk about later but it was founded out of the work on
5:31
the ipython project which is kind of a a project around improving
5:36
interaction with python um and the entanglement center are making making a better interactive coding experience especially with exploring
5:42
results across you know many different programming languages so jupiter a lot of people don’t know is actually not isolated it’s
5:48
just python it’s built in a way that any language can run inside of it and there’s kernels for pretty much every language it’s
5:54
available kernel is the term for the the reply the back end that runs
6:00
and then interact was formed actually at a group of jupiter devs that want to explore some new patterns develop
6:05
new tools and try to build some experimental opinionated pieces of software some of those have
6:11
been really successful like paper mill i think the interact component library has been used a lot more on the corporate sides maybe not as much on
6:18
individual side but they’ve had lots of successes okay i didn’t know that yeah jupiter
6:23
actually supported all these other languages so that’s interesting i guess you learn something new every day that’s um great i’ve never used it for
6:28
python so yeah it’s interesting to learn that oh there’s so much
6:36
is that a common knowledge just to a lot of people uh it’s decently well known like if you look at paper mail actually
6:42
inside it’s got like i think eight or ten language translations so you can pass parameters
6:47
and you’ll know for those eight or ten programming languages how to convert them okay awesome yeah i’m interested in also
6:53
learning more about the history of the name and the logo
6:59
yeah so um project jupiter is a reference specifically to the um three core programming languages that
7:04
supported by jupiter at the very beginning which was julia python and r uh and then it was uh
7:10
as i understand it a homage to uh galileo’s uh notebook recording the
7:16
discovery of the moons of jupiter um so the logo is kind of based on the jupiter’s moons
7:21
uh and you know kind of has that history of sort of play on words there well interact decided to go a little
7:26
further with play on words and you’ll notice they have a lot of uh all their projects are kind of uh
7:32
different uh different phrasings to to make fun of like uh the metaphor behind the scenes
7:39
um so like paper mill and scrapbook and bookstore and i think so and the interactive is also
7:44
in that same thing it’s just a play on words for interaction
7:52
we’ve got tony muted sorry i’m just going to unmute you tony oh there we go i’ve muted you i apologize
8:03
overlaps with a lot of technology off of solid ideas out there um are there any
8:11
sort of alternative projects uh that apply in this space yeah so the notebook space is fairly
8:17
rich it’s actually gotten a lot more rich in the past few years in the sense of like there’s a lot of competing ideas um some of the older ideas that you’ll
8:24
see the like most most notebook offerings you see out in the wild nowadays are jupiter-based or extensions
8:30
of jupiter so they follow the ipmyb format or the specs or both uh but there are some others that don’t
8:36
uh in particular there were some like competition early in the days of jupiter with zeppelin and then you know there’s
8:41
been observable and a bunch of others that have tried new new stabs in the how should a notebook experience be
8:47
and maybe a different way of approaching it and so it’s a healthy system of different options okay um
8:55
one thing i’m gonna remind the audience is please feel free to ask questions in the chat and we’ll get to answering those towards
9:01
the end of the episode uh so ask whatever you’d like and yeah we’ll get to it um one thing i
9:06
was also curious about is what technology is it built on yeah so um a number of technologies involved in
9:12
each component uh a few notable like pieces of the of the stack both for
9:18
um between interact and jupiter are things around like uh the kernels use zero and q for communication protocol
9:23
the jupiter servers are all the open source ones are tornado-based so they’re tornadoes uh technology
9:30
technology um and the front end for interact for example is with react and typescript
9:36
uh and then some of the other different front ends have some of the uh older texts in there or other branching
9:41
text and then the um or different different choices at different points and then um
9:46
python is where most the back end is written in uh which is actually where i spend most of my time so you notice i’ll kind of be like yeah
9:52
there’s some frontline things back end of so you’ll see the uh the python is used
9:58
pretty extensively for doing uh all the server code all the like like paper mail’s written in python even though it translates the requests into
10:04
the different languages well you are a fortunate man to be a python developer
10:11
who isn’t writing javascript right now so oh i am writing some javascript right
10:16
now i really like in the javascript side the uh i think typescript and react is
10:23
together have made a big improvements to the quality of life of writing javascript nice nice uh yeah it seems to impact
10:30
large organizations uh developing these technologies so let’s go back like a little bit um
10:36
you know what was kind of the who sort of set the foundations for the work that you’re doing today um and
10:44
who are someone kind of the key players and influential players to make some of these technologies happen
10:49
yeah so this is going to be i could go on for a very long time of the number of people who’ve been involved and helped like there’s
10:56
a lot of people here and we’ll talk about that later but the the original founding of it if you want the history
11:01
of it was fernando perez was working uh was a founder of the organization 2015
11:07
and then uh he kind of there was a lot of other early contributors um but he kind of started up this out of
11:13
the ipython development group and decided to make this more interface um i’m there are
11:18
people in the chat that know more about this history than me so i won’t go too far on that because i’ll probably say the wrong thing uh i
11:24
actually joined a little later in the story arc um maybe in in around 2018 uh when i
11:29
started to really get involved and start contributing some code and then slowly become a maintainer of more parts and help with
11:36
more things okay it’s great to see that you’re a maintainer and and also learning about who started i’m sure there’s a long long
11:42
history uh especially with open source projects people jumping in jumping out but who maintains the project currently
11:49
um so there’s dozens of people across all the projects um i i maintain help a lot with uh the back end library
11:56
maintenance so i kick off most releases for the jupiter different back end parts
12:01
the individual libraries not the notebook servers themselves so things like uh mb convert and mb client and jupyter client and and all
12:09
these kind of supporting libraries so i do a lot of maintenance there and uh there’s a there’s a handful of people that are
12:14
they’re helping on that side on the paper mill or on the interact and paper mail side of things
12:19
i’m doing most of the python maintenance there we’d love more maintainers there’s a few people starting to contribute again more
12:25
so that that’d be awesome to see more people there um and then the um on the
12:31
other front there for like the front ends and the servers and jupiter hub and ipi widgets there’s you know like a
12:37
dozen or more people actually more than a dozen people that all contribute there i mean how can
12:42
people get involved yeah it’s actually yeah it’s actually easier than a lot of people think they get scared to get involved i would
12:49
say the first way you do is um you know find an issue to help with uh there’s lots of issues we try to tag
12:55
them for um on some of the projects for good issues first-time users
13:01
yeah if you’re missing if you don’t see any on a project um you know ping us on for for interact in the slack group or on
13:08
gitter or the email group and and maybe someone can go find some craps come up and point you to some early
13:13
starting ones um i think getting involved even just creating issues is a really good way to get started if you create issues oh i
13:19
ran into this thing and describe it well and maybe try to find out why a little bit you don’t have to solve a problem but you can get involved
13:26
and it helps us a lot rocking so uh you all have contributors from all over the
13:32
place oh yeah jupiter discourse is another one sorry how is the court’s been working for you what kinds of
13:38
what kind of advantages has discourse uh provided you i know that this isn’t a part of the question but um it
13:45
actually kind of makes a nice little uh you know it’s not like a stack overflow but it gives a nice place to have
13:50
conversations and announcements both in the same place um i’ve been using it more and more i i need to set myself up as a better
13:56
watcher because i sometimes miss something with someone pings like oh there’s this question from a month ago like oh shoot i missed that
14:02
but i i think actually it’s it’s better than a lot of the other forums we’ve had for things because you have a good history of a topic
14:09
and you can comment on the topic and see what other people said whereas some of the others are very like there’s a message and then it goes away
14:14
um or it’s scrolled past history so i think it gives a little better kind of community feel around how to ask and
14:20
intercage great yeah it seems like you have to have a significant community to build that kind of discussion so
14:26
you know where your contributors and users kind of coming from and how are they finding you all yeah so like the
14:33
the very early authors um and and users of notebooks were in the kind of sciences and academic institutes you see
14:40
a lot of support from like uh berkeley and and like other academic groups um and and researchers
14:48
that went and more on the research side i think gradually has been more and more like um corporate app developers
14:55
uh local app developers uh and data systems folks trying starting to chip in and help and
15:01
contribute more um and then so at this point there’s users across the board we have academic scientific
15:06
corporate um even even i think you know we had a
15:12
you know contributors in non-tech spaces too which is kind of neat um there there’s education software and
15:19
machine learning flows in etl pipelines being built with it there’s satellite monitoring software that i know uses paper mill
15:25
which i was like oh boy i really hope i wrote that well it was crashing down yeah yeah i don’t
15:33
want that on top of both jupiter like like we get contributions from users from all
15:39
different backgrounds so the other day someone who could tear you something i had no idea until i saw a tweet and someone was saying oh good job and it
15:45
was from someone who was a they were a journalist and they committed a bug fix to like a low-level library which i thought was cool
15:51
wow so it seems like yeah there’s comes a lot of people coming from plenty of different backgrounds we’re talking even space which is
15:57
very interesting to see i i too hope you programmed that well um but on that topic is the project
16:03
participating in any diversity and inclusion efforts uh yeah and they have they do
16:09
in a constant effort i’m going to be unfortunately not the person that knows all of these um i’ll say a few of them that i know that they’ve been involved a
16:16
little closer to things i’ve seen while i was involved in the project um so in the jupiter org especially i’m not as
16:22
aware of all the efforts but i know they did the outreach program with berkeley which was a really good diversity program
16:28
um and then there’s been a lot they got they do get involved a lot of meetups for different diversity groups i’m not as
16:33
familiar with all of those i’ve only been to a couple of those in the jupiter side on the interact side uh
16:39
more recently we’ve had more things with the google summer of code we have three interns this year so that’s a nice
16:45
program for getting people in areas that maybe normally wouldn’t be a tech job so really excited for those they’re working on cool things and they’re
16:51
they’re fun people i really enjoy them and then uh um hack to know for vest is
16:56
done every year and that gets a lot of people from uh involved from groups that normally maybe
17:03
have difficulty getting in and then women who code was a recent uh online thing that that interact was involved with
17:09
okay great yeah thank you and uh yeah and just the the emphasize too is like the
17:15
groups try like to be open both of them i think interacting especially is kind of has it’s in this sort of
17:21
memento that the the it’s really about inclusion open community being central to the org um so we we try to be as open
17:28
as we can and and let people you know explore and work on things as best again
17:33
okay that’s great yeah because i find that yeah people even having the first issues and making it uh
17:38
very easy and and lowering the barriers for people to actually enter and start is is something which i think
17:45
is core to the success of at least the early success of a lot of projects um i just want to remind everyone who’s
17:51
listening please uh do not hesitate to answer and ask any questions you want in the chat and we’ll be getting to those very
17:57
soon uh but now we’re going to shift gears and we’re going to give a project demo
18:03
so we’re going to be able to see some very cool features of interact and basically how it works matthew’s going
18:09
to take us through that so while math is getting mata is getting set up i would like to take this opportunity to thank our sponsor
18:15
quansite for sponsoring this episode of open source directions so thank you very much consigt
18:21
quansite creating value from data so matt when you’re ready take it away awesome okay so
18:29
uh let’s go ahead and go actually to the interact piece
18:35
yeah i’m gonna do a little quick i’m assuming people can see my little screen
18:43
uh so here we’re gonna run i’m just gonna show you a little bit like what uh what a notebook might look like um and the uh what the kind of
18:51
interaction might be in a notebook and kind of walk through just a really basic example and i’ll
18:56
show you a really basic example of paper mill running and said just kind of show like what what the product kind of looks like
19:02
and then we’ll talk more about the tools that are available available and be able to kind of you know walk around there so i’ve got an
19:08
environment here i’ve already got a python environment and i’ve already launched this interact um
19:14
notebook here i just created a new one off of the those height interact is just a ui um
19:20
it’s maybe like it’s one of the uis that they could use some more love love more contributors because i think it has a
19:25
really a niche place in the ecosystem there’s classic ui which you might be more used to and jupiter lab which is um on the
19:33
jupiter side is really kind of a more like uh robust like has lots of different flags and options so
19:39
interact as a stab at taking an opinionated clean interface um and we’d love to see more contributors so if you’re thinking about
19:45
places to help this would be a good one um so uh we’re gonna real quick just kind of like load a data frame
19:52
a pandas data frame and kind of show visualization with it and just kind of make that really easy so what do we have to do here well what we’re going to do is um we’re
20:00
going to pull in vega has these cool data sets are kind of iconic um the kind of data that was collected
20:05
used for early data analysis conversations um and then we’re going to import
20:15
and then we’re going to set this little flag here that i’m going to copy in this flag is just to turn on a feature
20:23
called data explorer if you don’t do this you’ll still get outputs but this will give us a visualization automatically whenever we
20:29
print a pan of state frame oops
20:36
uh okay let’s see what i did i did not install it into this environment
20:44
let’s try and you just had spd so all you have now
20:52
oh yeah yeah i just had the wrong one there we go i’m sorry i could be painting cool i like being like panda helped me
20:58
do this it’s uh yeah we got it so uh all good and then
21:03
what we’re gonna do here is we’re just going to um go ahead and pull a data set from this this is really
21:09
simple right data.cars um so cards is like this little data set that um
21:14
it’s uh in in the data it kind of shows cars from the 90s and 80s and some stats about them
21:20
which is kind of cool so i’m going to call expanded output so we can see this really well so you see by default we have this kind of uh
21:26
chart of data um you know it’s kind of just rendered it’s got three pages we can set how many rows it’s got some
21:32
horsepower acceleration what year it was made so you can see some like um all the years and stuff so we’re
21:37
gonna do is we’re gonna click over to like one of these uh visualizations uh and this is kind of a cool little tool for doing uh
21:43
visualizations on this um type of data and so pandas data frames can be converted as
21:50
tool called data explorer with another open source tool um we’re gonna be working on a lot in the future so i’m pretty happy with
21:57
this there so we can do things like you know miles per gallon cylinder it’s not as interesting because you can see there’s four six or eight
22:02
so uh if you notice the fewer single energies you have generally the more uh miles per column
22:07
you get uh but we can do things like um you know maybe uh miles per gallon based
22:13
on horsepower and then this is maybe a more interesting graph where you can see things like you know how much how these two things correlate
22:20
and then you know we can quickly see maybe we also want to have you know the weight of the cars the size of the bubbles
22:26
so you can see heavy cars you know have high horsepower but generally fewer miles per gallon but there’s some exceptions like here we
22:32
have this um little full tiny volkswagen with i guess 26 miles per gallon um
22:39
and it uh has very little horsepower but then this this you know we have a bmw
22:44
that has more horsepower has the same miles per gallon and so it’s a little heavier of a car um but you know so
22:51
things like this you can do sort of quick visualizations and you can do you know things like trend trend lines and colors and all that jazz
22:57
the lack of code that you put into doing this demo is astounding [Laughter]
23:04
yeah data explorer left definitely emphasizes the ability to kind of do data exploration without having to write a lot of code i can show
23:10
you also you know the version of doing something similar like this with matplotlib or or the plethora of other programming
23:16
language options but yeah data explorer lets you explore data pretty quickly and cleanly how many lines would it be with
23:22
matplotlib to do something like this
23:28
to get with with the bubble size and the like all the clicks i just did uh maybe
23:34
10 or 20. wow that’s incredible if you knew what you’re doing yeah in the middle in the middle you might see like the holovis ecosystem being a
23:41
little bit shorter but uh there’s a lot of uh awesome stuff uh that altair does i mean once you have
23:48
the ema you know a lot about your data and that’s part of the struggle that we have with
23:53
notebooks kind of is what’s what’s in the data all right and here we’re going to do a little we’re going to do a little more
24:00
demo so um normally you would have like a data set you know it’s actually by key instead we’re gonna we’re gonna do this
24:06
thing where we um obviously we’re going to change this to um
24:16
i’m gonna do a get attribute um
24:25
that’s right but instead what we’re actually gonna do is we’re gonna we’re gonna use the new variable i’m not gonna put a placeholder you can make a
24:31
parameter cell put a placeholder but for six demo i’m just gonna do this real quick like this so we’re gonna do data and we’re gonna do
24:36
um call this uh ds name all right and we’re gonna save
24:44
this what we’re gonna do is we’re gonna change what screen we’re sharing
24:59
so one sec we’re going to change screens
25:09
so uh we’ve got a little terminal window here you can see the notebook server is running right above me
25:15
and the um here what we’re going to do is run a little paper mail command so paper mail and we’re going to do this
25:22
untitled one
25:28
and we’re going to do an output and we’re going to say this is going to be iris ipy and we’re going to do dash p
25:36
and then what do we call that thing ds name i think
25:46
all right so we’ve now saved um out a new notebook so let me go load that up and then share
25:52
the screen so paper mill is not doing all of this
25:58
necessarily but it’s putting some nice things together to make this happen right
26:03
yeah so uh paper mill what it’s doing is it’s a headless execution so it just execute that whole notebook and the end
26:10
uh and let’s show the screen so you can see what the output was uh it executes the whole notebook as
26:15
though if you were a browser that said restart and run all on a notebook but what it does is actually isolates the
26:21
input from the output so here it’s um um it’s actually uh generating a new
26:28
notebook document and leaving the old one alone cool so when you’ve had these things
26:35
running in production how many times would how many would people be running really large
26:40
simulations with these things sometimes sometimes you do very simple things somebody do really long things i mean we had ranges from
26:46
things that execute in seconds to things that would execute in days or up to a week um
26:53
and uh at netflix in particular we had um so we had moved all of our this is
26:58
gonna sound crazy i’ve said this before a lot but we moved all of our etl to be running on top of notebooks
27:04
so every etl job you try the schedule if you said like i want to run a spark job i want to run this day to transport i
27:10
want to run my custom code we ran it on top inside of a notebook so we had notebook templates and then we let that so users could say hey i want
27:17
here’s my spark thing to run we wrap it in a notebook if they had a notebook already we could just run that as well why was that so we
27:23
ran like that yeah so we actually made that shift for them and it was kind of transparent at first and then it slowly became because
27:29
the the emphasis there was to um actually rather than making our users
27:35
have to change what they like using and and move to keep using what they’ve been using without like rewriting their code
27:40
so for notebook users already using notebooks they didn’t have to rewrite what they were doing they could just schedule their notebook directly
27:45
and this is what like um paper mill did here with like this example we injected this ds name equals iris and
27:52
then you can see down here we actually have the iris data set now loaded because we loaded by a parameter
27:58
into the code but it’s actually code if i rerun this whole thing it’s going to do the same thing right so i can just run this code and it will
28:04
do the exact same execution because i injected s code so the other reason that um that we
28:09
really adopted this more from it was this kind of idea of notebooks everywhere like why not use a notebook
28:14
was because it makes a really good integration tool like you can inject all the parameters that were involved in
28:20
your job execution you can rerun and reproduce exactly what was run on that scheduled job by just
28:26
copying the notebook and re-running it so you can you get all this as like a single document that
28:31
represents what you wanted to do how you configured it what the outcomes were what the graphs and
28:37
logs were and how to rerun it all in one document so you reduce the number of tools you need to know to to interact and debug but down
28:44
to one almost like a placeholder and like prefer perfect this template and then you can just reuse it as you will yep and you see here i actually
28:51
haven’t even started the python kernel but i can still do things on here like i’ve already changed the craft type and i can set like
28:57
you can tell set ball length versus supple width and then we can do things like let’s do the um
29:02
the pedal width is the bubble size so you can you can do i’m doing this without actually running any code because the data was actually saved into
29:08
the notebook for this data frame so you can just load the dx4 right off of it we actually use this a lot because we do
29:14
um it netflix and we’ll do this notable as well you can make read-only interfaces that can read the scheduled
29:20
outcomes and then you can still kind of look and explore the data without having to run any actual code nice
29:28
okay so yeah are you done with the project demo yeah i think i just wanted to choke briefly like kind of some of the
29:34
flexibility of those two tools we’ve kind of been talking about in jupiter and the paper mill and then uh i think we want
29:41
to spend a little more time talking about like what are all the tools and then how do they relate to that that type of ecosystem okay cool
29:48
uh yeah maybe we’ll go on to maybe you can discuss that in the project roadmap uh discussion which we’re now
29:53
going to have us we’re going to again shift gears and this is another fun segment for
29:59
those of you who are here for the first time uh we’re basically we’re going to be talking about uh where matt’s going to be talking
30:05
about where interact is going i’m going to be listening and learning and what future directions it’ll be
30:11
taking so with that um matt can you tell us a bit about the directions that interact
30:17
is heading yeah so interact is at this point like we’re kind of we have
30:23
lots of projects in exploration mode so if you look at interact you might first think it’s only a desktop app
30:28
desktop app is it’s a proof of concept and it’s a cool usable tool for lots of individuals uh but the the interac group and this
30:35
app is running that interact interface i just showed you in an electron app so you can run it as a little
30:40
install it and run it anywhere the but the larger ecosystem of interact is
30:46
actually a collection of libraries there’s the interact ui which is the uh the front and component library so
30:53
defines all the things you kind of just saw as individual react components that you can reuse um and so that’s like actually the most
31:00
valuable thing in interact i would say even though it’s the least visible to people it’s used a lot and and it’s it’s
31:06
actually what is kind of differentiator between that ui and other uis um is that you can take the pieces you
31:11
want and rebuild your own and then the um on the back inside you know there’s a whole plethora of python
31:18
libraries that do different uh different tasks in there that we’ll talk about but in terms of roadmap i would say that the
31:24
for the interact side i think the roadmap there is to get the that interact beta i just showed you kind of polish more get it kind of more
31:32
productionization type of features so like there’s some quality of life things in there that are friction points so
31:37
trying to get more contributors to help kind of fix the few like usability issues that are in there
31:43
and make it a real like actual um viable like well-built and well-constructed ui
31:49
today it’s a ui you can play with but there’s definitely friction points you hit and small bugs that need to be fixed
31:55
um and the framework behind it is really cool so it’s got a good base so i think that roadmap there is really
32:00
making sure it’s usable for people to reuse the components better and making sure that that ui gets improved over
32:05
time so there’s been a grant recently on the making the desktop ui experience a lot better that’s being worked on right now
32:11
so that’s really cool and then on the the back end side of things i would say paper mill is is kind of it at this
32:18
point is mostly well matured and feature complete there’s some edges about how far we want to push
32:25
paper mill to encompass so the roadmap there is kind of about
32:30
seeing what’s needed in the community and then figuring out a way to do it that’s opinionated um uh doesn’t lead people to making
32:36
mistakes and is is flexible enough for people to reuse and then the other libraries around it
32:42
like scrapbook and testbook and some of those others are really in like beta status where they’re usable so i love this the kind of roadmap for
32:48
them is to kind of get the feature click and 1.0 releases that have like everything so we feel like hey it’s a completed project
32:54
uh and then and then from there like continue with small things that people want to have so happy to have more people lead on on
33:00
those types of things and contribute and then the jupiter side there’s a bajillion things so i don’t know where to start on that on
33:06
that one um and there’s a lot more projects the back end there is just kind of like please keep things going and moving forward i
33:13
think some formatting improvements so things like um i think one of the road mapping things a lot of people want to
33:19
see is see the spec move forward so the jupiter spec have some new improvements for things have been asked for a long time i’m
33:24
gonna maybe do a push try and get some ids in the cells and that’d be really cool um and things like that so i think
33:31
there’s a movement there and then um jupiter lab and binder and some of those others have had their own kind of road
33:36
maps that i’m less familiar with um yeah i can talk more about those
33:42
individual things or individual projects as well we can dig in um if you want yeah if you want to do a little bit dave we got time
33:48
yeah i’m curious what you’re talking about when or what you’re referring to when you’re talking jupiter spec
33:54
yeah so that’s actually one of the things that’s made jupiter so successful is two factors of it well three i think
34:01
one is they had a community component where many people could get involved and especially interact has tried to really
34:06
emphasize that aspect um there’s also the um jupiter is component
34:12
like it’s a component based even if it’s not uh like like you can inherit and override
34:17
or extend or plug in especially on interact stuff you can plug and play anything so the emphasis there was to make it so
34:25
it was interoperable i have like this idea of a contract for how each part works and i can match against that contract
34:31
which means companies were able to adopt the technology easily because they have some extra business laws that they just have to plug in where that business
34:37
logic applies and leave everything else the same open source everyone’s used to and then the on the
34:45
actual spec the file format the ipmyb file is a well
34:50
defined spec file where it says what is included in that document to
34:56
specify what a notebook is and so a lot of groups even non-jupiter groups actually use the ipmyb
35:02
spec to define their notebook document and what’s inside of it so it’ll have things like your cells and
35:09
the metadata about the cells and any like flags that are on and execution times and the
35:14
kernel you used and the runtime information so all that stuff gets baked into the uh that notebook document
35:20
and there’s it’s been kind of a static setting set of fields for a couple years um there’s only been a few minor
35:27
additions so there’s been some emphasis to kind of move that forward and then you might also hear from spec also from the idea of
35:33
kernel communications there’s also protocol um the protocol in which how the back end
35:39
component though the the thing actually running your notebook code how it communicates with a server or a
35:46
paper mill headless execution like that communication protocol that’s evolved a little in terms of how it’s
35:51
like quality of life improvements so like parallel execution works better and things like that but it hasn’t had any spec
35:56
movements either in this so there’s some demand that have um some more message types available
36:03
um yeah so those those are kind of the spec changes that i think would be good for the ecosystem for
36:09
everyone it’s like not it’s an ubiquitous improvement that we could do the risk is
36:14
is when we change things touch even things like i fix bugs and in the low level code and
36:20
sometimes fixing those bugs causes uh disruption for people who code relied on that bug being there to keep running
36:26
um and and it can sometimes be a big disruption like things like you shouldn’t have been able to save that file because it was like you
36:33
couldn’t set the permission so you’re leaking secrets and then we introduced something that patched that over and made so hey it will actually
36:39
check that the secrets are saved and then all these um docker and windows installations all
36:44
started failing which is really frustrating because they couldn’t set the permission bits things like that so you can you can have
36:50
a it is a little bit of a a stressful thing to push us forward but for the good of the whole ecosystem i
36:57
think we should do it okay that’s awesome now we’ll move on and we’ll be answering some of the
37:02
questions from the audience so yeah thank you very much for asking those questions uh
37:09
well i feel like uh this question could uh go pretty far um judging that we’ve
37:15
already talked about satellites but what are the cases from uh that you’ve seen from jupiter and
37:21
interact uh anything that you’d like to highlight that really sticks out in your uh experience
37:27
yeah i think um uh the there’s a few there’s really there’s some really awesome things people don’t know you
37:32
could do like one was like people were doing um instrument monitoring because it’s really handy to have something where they just continuously run
37:38
instrument monitoring and have a visualization of what it was so we were doing that with paper mail i think that was the the satellite use
37:44
case though they didn’t talk too much about the internal details when they asked me questions um so the uh
37:50
that one was like they were keeping they had to continuously monitor and then output and it was really easy to just look at a notebook real quick to see
37:56
what the current status of things or go back and see what happened it you know five hours ago with what was the the status of the
38:02
metrics at that time so it was like a very low friction effort to do monitoring some ones that people maybe don’t know
38:08
as much as well um a lot of like operational tasks in a very similar vein for monitoring or
38:15
doing tasks and systems at netflix for example we scheduled a lot of our operational tasks against the
38:20
scheduler and adjacent systems but just making a notebook that like cleaned up things and it recorded what it cleaned up
38:25
so you have something like you know a ttl on some data or an old um an old
38:32
like set of content or something you need to keep an eye on until it’s all done you can just set up a notebook to check
38:37
the status of that thing and report on it and you just run that every every hour every day whatever the cadence is that makes sense
38:43
and then that’s a really handy handy tool because you don’t need to like define a bunch of infrastructure
38:48
and everything you just write a notebook and then you just you write the notebook to ask what’s the sas thing and you can go look
38:53
at any time it’s not like a full replacement for a metric system but it’s a good hop in for like low low
38:58
friction low consequence things you want to keep an eye on um some other use cases that i thought
39:04
were really cool that someone told me there’s uh one group that uses it for um doing a threat investigation so like
39:11
security work where uh when they have a security attack detected
39:17
and they’re looking and searching for for what that attacker has done in the system or how far they penetrate
39:22
it’s a penetration report they actually do all the work for analysis in the notebook and at the end they just save that
39:27
notebook and that’s the record of how they found what was there and what what the user had accessed and it’s a reproducible document that
39:33
had the logs of everything associated with that attack i thought that was a really cool use case for jupiter
39:39
definitely that’s exciting and it seems like it’s a little bit of everything a little bit of everything which is very yeah i mean it’s an interactive code
39:46
environment right and the the reason why it gets used in a lot of different ways is because you can have logs visualizations and the code you
39:52
used to execute all in one place and it has this um you know the cell concept is a really nice construct for
39:58
exploration uh and so it also makes a really good concept for integrations because
40:04
uh before when you have large programs you oftentimes don’t know where messages are coming from or what they’re
40:09
associated with because this is this huge code base and there’s a big log file that’s a megabyte long i don’t know what goes to what without and then you
40:15
get like log lines you have to trace them here in notebooks because you’re breaking code up in little like mini
40:20
function like units you get a lot of benefits for both integrations and for exploration because i can focus on just this part that was
40:27
involved in what crash or what didn’t work or what i was looking at visually yeah awesome
40:32
uh the next question is from zach does paper mill have a way to inject parameters
40:37
or values into markdown cells not into markdown cells no um you’re
40:44
probably hinting a little bit at the like r studio type functionality that’d be a cool feature i think i think
40:50
getting mark down to work more like the kind of our users are used to would be really cool um there’s been talk about it but
40:57
uh i think someone needs to go spearhead that generally what it does is it actually just injects
41:03
it injects code as uh as or injects the parameters as code now that being said because paper mail
41:09
is written entirely plug and play like every aspect of how papermill runs is either a registered action or a um a
41:16
plug-and-play like class you need to give it so what that means is you can actually implement your own
41:24
in this case it’d probably be the translator or the engine depending on how how far you want to go
41:29
where you can overwrite what those parameters are used for so some teams do this for example to um
41:35
change it so like if you inject code into your notebook if they detect that the name that it’s a table
41:41
name as a string they’ll convert that to a pandas data frame automatically for you that loads the table or lazy data frame or
41:47
something so you can do all sorts of stuff like that and change how the how the behavior is without having to
41:52
rewrite the open source you can just copy one of those components that plugs and kind of extend it to your use case
41:58
and sometimes those get merged back sometimes you say that’s really awesome you should document that and and tell people about it okay awesome
42:08
um yeah yeah uh so uh jared just asked a question and it kind
42:13
of overlapped with mine um again i’m really excited about your company starting uh but as you’re starting uh notable it
42:21
depends a lot on the on different ecosystems and stuff um you know how does how do you see your company as taking
42:28
a leadership role in open source and you know how do you see this partnership between a company and
42:35
the open source playing out for you all yeah i think you know everyone on the team is
42:41
has worked with open source and maintained open source code from the starting group and we all really value open source we
42:48
don’t want we want to be a partner to open source we don’t want to be a dictator over it we don’t want to be a
42:54
uh someone who just takes use of it and doesn’t contribute back so really our emphasis is going to be
43:00
uh is our role the open source is hopefully to be as an engagement partner to be able to
43:05
bring other people together to talk about the open source so we want to be able to be hey we’re experts on this set of open
43:11
source and we would contribute into that the both interact and and probably some to jupiter itself
43:16
we’ll be contributing back code to for that ecosystem but that means we know enough there to
43:22
have the the basis for conversations with other people so we want to bring in netflix was doing
43:27
this a lot uh back like two a year a year and a half ago we were bringing other companies that were solving similar problems and
43:34
talking with them um and we want to do the same thing and like bring other companies that are having um issues within jupiter they’re
43:41
trying to solve or on top of jupiter or use cases or things they need if they were in the spec or in
43:47
the in a library that everyone would benefit from we want to help facilitate those conversations and so
43:53
really the role we we see is being as facilitators to improving the open source um as as a company awesome
44:01
thank you for that that was a great answer um the next question is from carol thank you carol uh what standard or spec
44:07
would you address first notebook format a messaging protocol or something else
44:13
i think notebook format is the one that needs the most move forward there’s some key things in
44:18
there that are missing that everyone keeps re-implementing and so the more people do that the more they kind of fracture out of jupiter
44:25
so for me i think um getting improvements on like identifying notebooks and cells and
44:31
um ident and tracking like what happened in that cell better uh and making a more general
44:37
space for that would be really good there is a generalization in the format for this metadata space where you can do any name
44:43
space field on cells or or or on the notebook itself the problem is is um for some of these
44:49
things that are really common uh you need it in the actual spec because everyone implements it slightly
44:54
differently and then when you need to write your react component that respects that field you need to look at five different fields and it’s
45:00
not great because they move and change so i think i think getting know before i move forward a little bit would be the best thing i think on the um there’s
45:08
probably some small things on the communication on the communication spec that could be done soon um
45:14
but i think that one like it’s working and uh the the things that are needed are maybe uh more niche
45:22
even if they’re important okay awesome well now we’re going to move on getting towards the end of the episode
45:28
but first of all before we end we’ve got our famous uh world famous rant and rave section where
45:34
we each get 15 seconds to either rant or rave about a certain topic uh imagine that it’s like a soapbox so
45:41
matt you’re up all right so i before i actually do my real rave i uh i do want to say like you know we
45:49
haven’t really talked about the world affairs right now i i think uh if if you have my one second on that
45:55
one is if you haven’t and you are able to you know contribute or donate to a group that you think will help with
46:01
the world situation uh i’m going to be doing that and more and more um and i encourage others to uh on the
46:08
maybe not world affairs and keeping it on a lighter subject i think um being nice to the maintainers
46:14
would be you’d be nice to your maintainers um most of us are working on free time on
46:19
these projects like all the stuff i did in jupiter was literally on my free time and um we care
46:25
very deeply about the success of these projects and i’ve seen a lot of people get burned out on the constant criticism and argument of behavior that a few
46:32
individuals sometimes bring most people are great but occasionally it’s really it’s it really does a number on people’s mental
46:38
health and i’ve seen so many people get burned out to the point where they stop engaging on software entirely um so be nice to your maintainers
46:44
they’re they’re trying to do the best they can help them out don’t don’t don’t fight with them please
46:50
all right and tell me you’re up 15 seconds my box that’s great i’m just gonna follow up on what matt said
46:56
just more in general because i’m a user not a maintainer but uh you know look out for your mental health look out for
47:01
your family’s mental health look out for your friends mental health um be patient with people listen to them
47:06
and you know regard your privilege uh when you have these conversations and um do what you can to help donate
47:14
contribute to open source do something productive um and uh godspeed that was beautiful yeah
47:22
um my it’s certainly ran semi-rave but i’ve been meditating every day for
47:27
35 days now and i just get so frustrated cause i’m like okay is this am i doing it right it doesn’t
47:33
feel like i’m doing it right and yesterday was the first day where it sort of i felt like i sort of shut everything off and was just send out
47:38
focusing and i was like wow okay i took 35 days i don’t know if i’ll be able to do it again but it felt good and i wanted to have a
47:45
little mini rave for that that’s all we have time for today sadly but thank you very much everyone for
47:51
watching and listening uh and matt thank you very much for joining us um you can find us on twitter uh at open
47:59
teams inc uh at open team sorry and at quansite ai uh matt so where can people
48:07
find you and jupiter and interact yeah so i on twitter at code seals a
48:13
good good way to go pace me uh or get a hold of me at jupiter or at project underscore jupiter is the
48:19
jupiter kind of tag and at interact io all one word is the tag if you want to connect to us on uh more
48:25
more personally or on a back and forth type of thing also the slack is slack to interact to io
48:30
on the interact side and we pay attention and try to respond there too and then of course on github issues
48:36
awesome yes if you liked what you saw today uh then please go to our youtube channel and like and subscribe to see
48:42
more of this kind of content it really does help and uh you’ll be able to stay up to date with the next
48:47
episode uh you’ll never miss a beat so we look forward to all of you joining us next episode
48:52
um join us again for our next discussion we’ll be with a recall graph so thanks very much
48:58
everyone have a good day and goodbye happy friday thank you all