This post if from 2016, hope its helpful (it’s aimed at me when I’ve forgotten how)

Trying to come up with a functional way to read a data file I came accross lots of alternatives, but along the way I wrote the code below. Which introduces quite a few concepts.

The code first, which you can cut and paste, and some notes and explanation later while its relatively fresh in my mind

package com.jonathanplay.tryandforcomprehension

import java.io.{DataInputStream, FileInputStream}

import scala.annotation.tailrec
import scala.collection.mutable.ArrayBuffer
import scala.util.{Failure, Success, Try}

object OpenAndReadAFile extends App {
  def readingData(fileName: String): Option[List[(Int, Long, Int)]] = {
    @tailrec
    def readDataInputStream(acc: ArrayBuffer[(Int, Long, Int)], dis: DataInputStream): Try[List[(Int, Long, Int)]] = {
      if (dis.available() == 0) Success(acc.toList)
      else readDataInputStream(acc += Tuple3(dis.readInt(), dis.readLong, dis.readInt), dis)
    }

    def using[A <: {def close() : Unit}, B](closeable: A)(f: A => B): B =
      try f(closeable) finally closeable.close()

    def readIndexFileStream(fis: FileInputStream): Try[List[(Int, Long, Int)]] = {
      for {
        dis <- Try(new DataInputStream(fis))
        listOfTuples <- readDataInputStream(ArrayBuffer.empty[(Int, Long, Int)], dis)
      } yield listOfTuples
    }

    def readIndexFile(): Option[List[(Int, Long, Int)]] = {
      Try(using(new FileInputStream(fileName))(readIndexFileStream)).flatten
    } match {
      case Success(result) => Some(result)
      case Failure(ex) =>
        println(s"Could not read file $fileName, detail ${ex.getClass.getName}:${ex.getMessage}")
        None
    }

    readIndexFile()
  }

  readingData("FileDoesNotExist")
  readingData("C:\\tmp\\exampleindexfile.idx")
}

Scala upper bound parameter <:

We want to only accept a parameter which implements a close method.

[A <: {def close() : Unit}, B]

A must be of a type which implements close.

    def using[A <: {def close() : Unit}, B](closeable: A)(f: A => B): B =

The function called using has two types, A must have a close method, and B is anything. The function then has two parameters. closeable is the parameter name, and it is of type A, while the second parameter is a function which takes a parameter of type A and returns type B.

How to close a resource in scale?

This is a copy paste from the thousands of other sites which show ‘using’

    // A has to have a close function, and f is a function to run but we always close after
    def using[A <: {def close() : Unit}, B](closeable: A)(f: A => B): B =
      try f(closeable) finally closeable.close()

Which is fine, you call this curried function as below - and note that there are lots of wierd syntax versions to call it. The function which actually reads the file is ‘readIndexFileStream’

// With no _ or (_) scala can determine the type and know that readIndexFileStream is a function
// In some cases it wont be able to in which case you must use (_) or _ depending on your aim
using(new FileInputStream(fileName))(readIndexFileStream))

// alternative syntax, where _ means pass in as a function value - evaluated as it is used
using(new FileInputStream(fileName))(readIndexFileStream(_))
// Or the _ which means a partially applied function
using(new FileInputStream(fileName))(readIndexFileStream _)

Scala function value or partially applied function??

All praise to Scala Puzzlers, puzzle 12 for this information.

Scala allows you to leave off the underscore after a method name when it knows the expected type is a function, and the type of the function is consistent with the signature of the method.

A function value is not evaluated until it is used, and it is evaluated every time.

A partially applied function has parameters evaluated at the time… tell you what, read scala puzzlers, puzzle 12! ( also called “Count Me Now, Count Me Later”)

How to use Try for exception processing in Scala?

The for comprehension has a series of generators which are called in turn, providing a previous one did not fail. We are using an aspect of Try for this - Try is great as it says the functions may fail in a recoverable way. The for comprehension is composing operations which return Try with minimum syntax overhead.

def readIndexFileStream(fis: FileInputStream): Try[List[(Int, Long, Int)]] = {
  for {
    // gotcha here, if one returns a Try then the rest cannot return Seq of any type
    dis <- Try(new DataInputStream(fis))
    listOfTuples <- readDataInputStream(ArrayBuffer.empty[(Int, Long, Int)], dis)
  } yield listOfTuples
}

Scala nestable Try flattening

I want to open the file, which can throw file not found, and then call using - which can throw IOExceptions, but which will at least close the FileInputStream. But this creates nested Try, and without the flatten, a warning which reads:

Expression of type Some[Try[List[(Int,Long,Int)]]] doesn't conform
    to the expected type Option[List[(Int,Long,Int)]]

So, Try is like Option and you simply flatten it to avoid the warning…

def readIndexFile(): Option[List[(Int, Long, Int)]] = {
  Try(using(new FileInputStream(fileName))(readIndexFileStream)).flatten
} match {
  case Success(result) => Some(result)
  case Failure(ex) =>
    println(s"Could not read file $fileName, detail ${ex.getClass.getName}:${ex.getMessage}")
    None
}

Obviously my Failure handling is silly, but it shows what is going on.

Tailrec

Reading the file, which contains Int, Long, Int patterns until you get to the end. Using a mutable ArrayBuffer to append values as it is read in, and converting to an immutable List when we hit the end.

tailrec is better explained by google - but it uses the same stack frame so its efficient.

@tailrec
def readDataInputStream(acc: ArrayBuffer[(Int, Long, Int)], dis: DataInputStream): Try[List[(Int, Long, Int)]] = {
  if (dis.available() == 0) Success(acc.toList)
  else readDataInputStream(acc += Tuple3(dis.readInt(), dis.readLong, dis.readInt), dis)
}

The test file

You should create a file at C:\tmp\exampleindexfile.idx or equivalent for your Linux/PC/etc.

Type some random garbage into the file and run the code and you will get the output:

// Expected output, this is because the example file contains rubbish, so its just exceptions
//Could not read file FileDoesNotExist, detail java.io.FileNotFoundException:FileDoesNotExist (The system cannot find the file specified)
//Could not read file C:\tmp\exampleindexfile.idx, detail java.io.EOFException:null

Is this the way to read data?

There are lots of Scala libraries out there, but I did not google them all, as it should be pretty simple and I wanted to do it myself and learn.

References

There are so many I will list them another time, but a word of warning, even the best of them are aging fast as Scala changes. This post is from 2016 - so if you are in the future you should double check it all.