Staying Up-to-date in a Sea of Papers: The CCRi Journal Club

with No Comments

Machine learning is a large and fast-paced field, and keeping up-to-date with the latest and greatest can be next to impossible for any one person. A simple Google Scholar search makes it clear just how much content is published; there were over 120,000 papers in 2019 alone with the term “Deep Learning” in the title!

As a team of data scientists working on a wide array of modeling problems, we’d like to try our best to stay current with advances in machine learning, but we can’t spend all of our time reading papers. As such, we decided to start our own CCRi journal club, providing an opportunity to share and discuss any interesting papers we come across.

Journal club, virtually

We initially planned to hold our meetings in person, but the spread of COVID-19 has of course limited our ability to be physically located in the same space. Like so many other companies and institutions across the world, we’ve shifted our communication to be mainly online. We’ve been holding our virtual journal clubs for a few months now, and though it takes a while to get used to, we’ve found that we can have fun and interactive conversations over video chat. So far we’ve discussed topics ranging from model explainability to image synthesis to language modeling. 

In one meeting, we talked about “RISE: Randomized Input Sampling for Explanation of Black-box Models [1]”, which presents a method that helps you understand why computer vision models make the decisions that they make. What part of an image does the model look at when it says an image of a car is car, for example? Is it the same part of the image that humans use? The intuitive approach that the authors developed makes it fairly straightforward to create saliency maps to reveal the model’s decision-making process.

         Figure 1 from [1] showing how different areas of an image contribute to classification decisions.

In another meeting, we discussed “Language Models are Few-Shot Learners [2]”, a paper by the AI research company OpenAI that presents their newest and largest language model: GPT-3. This model is massive, with approximately 175 billion parameters, and it was trained on a huge corpus of text scraped from all over the internet. A key innovation in this work is that this model can perform new tasks with minimal or no task-specific training such as language translation, question answering, arithmetic, and text generation. The following shows some “poetry” that GPT-3 generated after analyzing the work of 20th-century American poet Wallace Stevens:

Nobody will come to this place. It is a road that leads nowhere.
The solitude is deep. The mountains are high.
But they are desolate, and they turn the traveler’s face
Towards the North. All the sounds of the world are far away.
When the wind rises above the trees,
The boughs bow to the ground.
Even the birds that inhabit the tangle of weeds
That is the roadside cover, are silent. One listens,
But hears no roar of the forest. One is alone.
One will be taken.
One will be taken.
There is no utterance, there is no conversation,
But one is uneasy all the same…
There is a thin blue mist,
A darkness rising like smoke,
And within that darkness
A possession of the heart.
One will be taken… It was here, and it will be here again
Here, under this sky empty and full of light.

         Figure F.1 from [2] showing a poem generated by GPT-3 in the style of Wallace Stevens.

 

The GPT-3 work touches on a lot of topics in machine learning and data science, and we discussed questions like “to what extent is this model performing actual reasoning?” and “how does a model like this affect the field at large, and what are the ethical ramifications?”

Looking forward

We plan to keep having our monthly journal clubs—it gives our team members a chance to take their minds off of their own projects for a bit, and it also creates a space for the kind of discussion and interaction that has been lacking during this period of social distancing. Combined with a machine learning seminar we just started, it’s part of an overall effort to build an engaging and active data science community within CCRi.

References

[1] Petsiuk, V., Das, A., & Saenko, K. (2018). Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421.

[2] Brown, Tom B., Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan et al. “Language models are few-shot learners.” arXiv preprint arXiv:2005.14165 (2020).

Share this post: Facebooktwitterlinkedin
Follow CCRi:     Facebooktwitterlinkedinrss