Tuesday, May 14, 2019

String s= aabcccccaaa
output = a2b1c5a3

public string convert(String s){
if(s==null || "".equals(s)) return "";

ArrayList<Character> sb = new ArrayList();

for(int i =0; i < s.length; ){
char c= s.charAt(i);
int n= 1;
while(i+1 < s.length && s.charAt(i) == s.charAt(i+1){
n++;
}

sb.append(c);
sb.append(n);
i = i+n;
}
return sb.toString();
}

Monday, April 24, 2017

Java Concurrency Bugs #2 – what to synchronize on

28 Jan 2009

Following up on my previous post, this is the second installment of Java Concurrency Bugs. In this post, we’ll look at bugs that come from synchronizing on the wrong object.

#1: Don’t synchronize on an object you’re changing [source:java]

synchronized(foo) {

foo = …

}

[/source]

If you do this, then other threads will be synchronizing on a different object than the thread modifying foo. And if those threads are synchronizing on different objects, then you’re getting no mutual exclusion, which presumably was your intention.

#2: Don’t synchronize on a String literal [source:java]

private static final String LOCK = “LOCK”;

…

synchronized(LOCK) {

….

}

[/source]

This looks quite harmless and it may even work fine, but it’s also dangerous. All string literals are interned, so any other component or library in the JVM locking on another (apparently different) string literal “LOCK” will be sharing a lock with this code. This problem does happen in the wild.

If you need an object to lock on and you don’t have any other pre-existing object that makes sense, it’s recommended to just use a new Object().

#3: Don’t synchronize on auto-boxed values

The JLS requires that certain ranges of literals always return the same instance when boxed. Thus locking on a boxed Integer(1) (not a new Integer(1)) will be locking against the same boxed Integer anyone else in the JVM could be using. This is effectively the exact same scenario as #2 – an innocuous-looking lock object could actually be the exact same instance used elsewhere, thus sharing a lock that should be separate.

#4: Don’t synchronize on null

This should be a no-brainer but it can happen. This one’s easy to find though since you’ll get an NPE.

#5: Don’t synchronize on a Lock object

It’s not that you can’t do this but it’s more that it’s probably not what you intended. If you’re synchronizing on a ReentrantLock instance, you probably really want to be calling Lock.lock() instead.

#6: Don’t synchronize on getClass()

In an inheritance hierarchy, it’s possible to fall into the trap of calling getClass() at one point in the hierarchy and believing that this means every class will be synchronizing on the same lock. But if a lower class in the hierarchy runs across this code, you’ll actually be locking on a different lock. [source:java]

// BAD

synchronized(getClass()) {

…

}

[/source]

Generally, you probably want something like this instead: [source:java]

synchronized(Foo.class) {

…

}

[/source]

#7: Be careful locking on a thread-safe object with encapsulated locking

It can be tempting to believe that locking on an object that is thread-safe will allow you to participate in the same lock. And sometimes you’ll be right. But sometimes you won’t. Unfortunately, the only way to tell is to read the docs and the code.

An example where this will work are the synchronized classes in java.util like Vector and Hashtable. These collections lock internally on the “this” instance itself, thus allowing you to participate externally in that lock if you want. But a class like ConcurrentHashMap does internal locking on objects that are not exposed externally, so it’s not possible to participate in that lock. It’s important to know which situation you’re dealing – if you’re writing a class with encapsulated locking, this is a great thing to document.

Java Concurrency Bugs #3 – atomic + atomic != atomic

30 Jan 2009

Background: Poll – #1 – #2

This installment of Java concurrency bugs is a simple thing but one that still shows up too frequently, particularly in those new to concurrency. It is tempting to believe that given a “thread-safe” class that you can use that correctly in all situations. This is not true. The problem is that composing two atomic thread-safe operations seems safe, but has several potential pitfalls.

The most obvious pitfall is assuming that performing two (or more) atomic operations yields a new atomic operation, which is definitely not the case. For example, consider this operation on a Hashtable (in which all methods are synchronized and thread-safe): [source:java]

/**

If the key already exists, get its value. Otherwise add the key-value pair.

public Object putIfAbsent(Hashtable table, Object key, Object value) {

if(table.containsKey(key)) {

// already present, return existing value

return table.get(key);

} else {

// doesn’t exist, create and return new value

table.put(key, value);

return value;

}

[/source]

Since the Hashtable is thread-safe, this method is thread-safe too right? Well, kind of. It’s not going to cause visibility problems in the table or deadlocks. But there is a race condition here if multiple threads are executing putIfAbsent(). After the first if check, it’s possible for anything to happen. Another thread could remove the key from the table or add the key to the table, invalidating the result of the if check. Similarly, after the put call, another thread could similarly add, remove, or modify the entry, thus causing this thread to return a wrong value.

To fix this code, you must either participate in the same lock that governs the existing atomic operations OR wrap those calls in a new layer of locking. I picked Hashtable quite intentionally as it locks on its instance, allowing external users to participate in the same lock used internally: [source:java]

public Object putIfAbsent(Hashtable table, Object key, Object value) {

synchronized(table) {

if(table.containsKey(key)) {

// already present, return existing value

return table.get(key);

} else {

// doesn’t exist, create and return new value

table.put(key, value);

return value;

}

[/source]

Or you could encapsulate access to this Hashtable in a full wrapper object that completely hides the Hashtable and provides a single composite putIfAbsent. In the case of a Map that didn’t use itself as the internal lock object, this would be your only choice. In a wrapper object, it would not suffice to just lock this one method in a new lock; you need to lock ALL operations in the new lock to properly synchronize individual get(), put(), remove(), etc calls against the putIfAbsent() call.

The ConcurrentMap interface added in JDK 1.5 defines composite operations so they can be implemented efficiently. For example, the ConcurrentHashMap implementation can efficiently implement this internally as it would be a performance disaster to implement it in a wrapper.

Another perhaps simpler example of this can be seen with volatiles. A simple volatile counter will appear to be safe to those first encountering concurrency: [source:java]

// This is broken!

public class VolatileCounter {

private volatile int count;</p>

public int next() {

return count++;

}

[/source]

The key here is that “count++” is not one atomic operation; it’s three operations. Read – Increment – Write. And the Read and Write parts are atomic but composing these three operations together does not yield an atomic composite. You really need to protect this counter with some form of synchronization or other mechanism of performing this increment atomically.

One good option in Java is the AtomicInteger (and its cousins) that let you perform this operation safely (and likely more cheaply) than with synchronization: [source:java]

public class AtomicCounter {

private AtomicInteger count = new AtomicInteger();

public int next() {

return count.incrementAndGet();

}

[/source]

Here we see how AtomicInteger defines a new method incrementAndGet() that is a safe (fast) composite operation of two otherwise atomic operations. But this happens only be merging them in an encapsulation, not by calling separate increment() and get() operations.

This seems like a simple thing but when it occurs in a bug it’s usually far more subtle. This problem with composing concurrent code based on locks is one of the biggest issues with producing reusable concurrent code and of composing large systems in Java.

Java 7 TransferQueue

28 Feb 2009

Java 7 will include several improvements in the collections and concurrency libraries under JSR 166y. One addition is a new interface called TransferQueue and a corresponding implementation called LinkedTransferQueue.

TransferQueue extends BlockingQueue which extends Queueinterface added in Java 5 and adds some new methods. BlockingQueue expresses the concept of a queue that a producer can block when adding items to a queue until there is space avaliable or a consumer can block when removing an item from the queue until an item exists.

TransferQueue takes this one step further and blocks on put until the item is actually consumed by a consumer (not just added to the queue). This new constraint is expressed in the key new method called transfer(). The name is very descriptive – because the blocking occurs until a hand-off is complete from one thread to another, you are effectively transferring the item between threads (in a way that properly creates happens-before relationships in the Java Memory Model).

Several other methods are included as well: two forms of tryTransfer() that perform a transfer but are either non-blocking (transfer only if it can be done immediately) or with a timeout. And then there are a couple of helper methods hasWaitingConsumer()and getWaitingConsumerCount().

When I first read about TransferQueue, I was immediately reminded of the existing SynchronousQueue implementation, which provides a queue with size 0. That seemed inherently useless when I originally looked at it but I’ve since found it to be one of the most useful queue implementations in the collections framework, specifically for this use case of handing off an item from one thread to another.

TransferQueue is more generic and useful than SynchronousQueuehowever as it allows you to flexibly decide whether to use normal BlockingQueue semantics or a guaranteed hand-off. In the case where items are already in the queue, calling transfer will guarantee that all existing queue items will be processed before the transferred item. Doug Lea says that capability-wise, LinkedTransferQueue is actually a superset of ConcurrentLinkedQueue, SynchronousQueue (in “fair” mode), and unbounded LinkedBlockingQueues. And it’s made better by allowing you to mix and match those features as well as take advantage of higher-performance implementation techniques.

Joe Bowbeer helpfully provided a link to a paper by William Scherer, Doug Lea, and Michael Scott that lays out the LinkedTransferQueuealgorithms and performance tests showing their improvements over the existing Java 5 alternatives. LinkedTransferQueue outperforms SynchronousQueue by a factor of 3x in unfair mode and 14x in fair mode. Because SynchronousQueue is used as the heart of task handoff in something like ThreadPoolExecutor, this can result in similar kinds of performance improvements there. Given the importance of executors in concurrent programming, you start to see the importance of adding this new implementation.

The Java 5 SynchronousQueue implementation uses dual queues (for waiting producers and waiting consumers) and protects both queues with a single lock. The LinkedTransferQueue implementation uses CAS operations to form a nonblocking implementation and that is at the heart of avoiding serialization bottlenecks. The paper lays out a lot more detail and data – if you’re interested in the details it’s worth a read.

Java Concurrency Bugs #5: inconsistent synchronization

13 May 2009

A classic Java concurrency bug that you still see pop up frequently is inconsistent synchronization. The Java Memory Model defines the semantics that a Java programmer can expect in a *properly synchronized program*. The JMM specifies that a write made under synchronization will be visible to a subsequent read under synchronization on the same monitor. A subsequent read NOT under synchronization has no such guarantees.

Typically this shows up in code where an attribute is written under synchronized but read without it:

public class Broken {
  private int value;
  public synchronized void setValue(int value) {
    this.value = value;
  }
  public int getValue() {
    return this.value;
  }
}

If one thread calls setValue() there is no guarantee that subsequent reads in other threads via getValue() will ever see that value. Probably the most common version of this I’ve run across (and ok, written myself) is with a HashMap:

public class SomeData {
  private final Map data = new HashMap();

  public void set(String key, String value) {
    synchronized(data) {
      data.put(key, value);
    }
  }

  public String get(String key) {
    return data.get(key);
  }
}

This is exactly the same problem – set() is synchronized, get() is not. However, HashMap has a far more insidious problem than the simple integer above. Hash map gets are only ~O(1) when the number of internal bucket collisions in the map is constrained (worst case, it devolves into a linked list at O(n)). To address this problem, HashMap uses the size and load factor to determine when it needs to rehash. A rehash causes new buckets to be created and all entries to be assigned to a new bucket by re-evaluating the hash function. And this is where the problem comes in with HashMap.get():

public V get(Object key) {
 if (key == null)
     return getForNullKey();

        int hash = hash(key.hashCode());
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {

            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
                return e.value;
        }
        return null;
    }

You’ll notice that get() contains a loop that walks through the collision list in a bucket. It’s entirely possible if you are reading this list while a rehash occurs that you can hit the termination condition for the loop incorrectly and miss the value. Older versions of this code contained a while(true) {} loop that could actually trigger an infinite loop. I’m not sure that’s possible in this newer version but in any case you can certainly get incorrect results from get().

One exception case for unsynchronized gets on a HashMap is with a read-only Map created during construction. In this case, you are subject to final field freeze semantics. These guarantee that if you declare the HashMap field as final, fill the map during construction, never modify that map or those objects after that, and safely publish the object, then all threads that subsequently read from the map are guaranteed to see the values in the map, even without synchronization.

An example:

public class Stooges { 
    private final Map stooges = new HashMap();

    public Stooges() {
        stooges.put("Larry", "Fine");
        stooges.put("Curly", "Howard");
        stooges.put("Moe", "Howard");
    }

    // This is safe (as long as stooges can't be modified elsewhere)
    public String getStoogeLastName(String name) {
        return stooges.get(name);
    }
}

Previous concurrency bug articles: Poll – #1 – #2 – #3 – #4

PS. I should make a blatant plug here for two Java concurrency things. First, I’ve written a Java concurrency refcard for Dzone that is currently in editing and should be out in June (free as with all refcards). I spent a lot of time on it and I’m pretty proud with how the content turned out (many many thanks to those who sent ideas and suggestions on Twitter).

Second, I’ll be doing a talk at JavaOne on Java concurrency bugs like the one above. If you happen to be going to JavaOne

JavaOne: Collections Connection

03 Jun 2009

For my last serious BOF of the night, I went to the Collections Connection with Josh Bloch, Martin Bucholz, and Kevin Bourillion. The under-stated goal of said BOF was to “Achieve world domination with the Java Collections Framework”. I believe they got 42.9% of the way there. In all, it was an extremely useful roundup of what’s coming down the pike in Java 7 collections.

Collection literals and array-like access – I covered this in the previous Project Coin blog – these are currently under consideration for Coin, so cross your fingers. Examples:

List<String> list = ["a", "b", "c"];
  String firstElement = list[0];
  Map<Integer, String> map = {1 : "One"};
  
  List<String> list = ...;
  String firstElement = list[0];
  Map<Integer, String> map = ...;
  map[1] = "one";

fork/join from JSR 166y – it’s likely that the fork/join framework (but not the ParallelArray lib) will be included in Java 7.

This library provides fine-grained parallelism by recursively breaking down tasks and accumulating results.

TransferQueue – also part of JSR 166y, TransferQueue and it’s interface LinkedTransferQueue are basically extensions to BlockingQueue that will provide an additional method tryTransfer(Task) that adds a non-blocking way to transfer a task to a thread that might be waiting on a queue. If no thread is waiting, the method returns immediately.

The big win here is that this allows ThreadPoolExecutor to be improved – currently it will spin up new threads even if there is a thread idle that could be used. There are good reasons for this but TransferQueue will eliminate those reasons.

I blogged about this some more a while back.

ConcurrentLinkedQueue improvements – Martin talked about several improvements being made to ConcurrentLinkedQueue:

When adding a new node to the end of the queue, the current implementation requires two volatile writes. With some tricky and the verboten use of sun.misc.Unsafe, they can trim that down to a single volatile write.
The current implementation can happen to leak nodes when the queue is being modified and there are some issues that can occur where the object structure is bad for GC (I blogged about this a while back).
Head and tail nodes can lag behind their actual head and tail in the current impl which is kind of a feature. I think Martin said that they would update every other modification to cut down on these moves but maybe I’m not sure that’s right now.

Improved Collections/Arrays.sort() – Josh has been working on taking advantage of the really awesome sort work done by Tim Peters in Python. This improvement has some wonderful properties. It is a stable, adaptive, iterative mergesort and it is written to take advantage of any partial order in the list being sorted – the order doesn’t have to be complete; any order is helpful. In a partially sorted list, you can see dramatic speed improvements. In the case of random lists, the performance is approximately the same as the existing sort implementation.

As a bonus, the memory usage is also reduced. In the case of mostly sorted lists, only a few extra elements are needed and in worst case n/2 elements are needed (as compared to current n).

Phasers – I’ve written about these in the past – basically improved CyclicBarriers.

ConcurrentReferenceMap – I asked about this in the Q&A and it sounds promising that Bob Lee and Doug Lea will have something put together for the ConcurrentReferenceMap based on the Google Collections MapMaker builder.

Google collections – Kevin Bourillion did a very high level overview of some nice stuff in the Google collections library, which is 1.0 RC2 as of today. Hopefully many of these improvements can be moved into the JDK in the future. Highlights:

Immutable collections
Multisets, multimaps
MapMaker – for creating maps that are optionally concurrent, have soft/weak/strong keys/values, timed expiration, and lazy value computation

Entry set puzzler

11 Jun 2009

My colleague Chris Dennis, sent this puzzler around today:

import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;

public class EntrySetSize {

  public static void main(String[] args) {
    printSetSize(new ConcurrentHashMap());
    printSetSize(new HashMap());
  }

  private static void printSetSize(Map map) {
    map.put("hello", "world");
    map.put("world", "hello");
    Set set = new HashSet(map.entrySet());
    System.out.print(set.size() + " ");
  }
}

The question is what does this program print when executed?

a) 2 2
b) 1 2
c) 0 2
d) something else</li> </ul> It seems like the answer should be a) and you will actually see that in JDK 1.6+. However, in JDK 1.5 you’ll see b). Understanding why requires a bit of a jaunt through the CHM code, where dragons lurk. Also, it’s actually possible to see several possible answers (at least a and b) in JDK 1.5 depending on what keys and values you use.

map.entrySet()

This call on a CHM does the following:
```
public Set<Map.Entry<K,V>> entrySet() {
    Set<Map.Entry<K,V>> es = entrySet;
    return (es != null) ? es : (entrySet = (Set<Map.Entry<K,V>>) (Set) new EntrySet());
}
```

</pre>

All that does is create (and cache) an EntrySet instance, which is an inner (non-static) class specific to CHM. It has access to the real CHM state and implements the interface using that state. 

**new HashSet()**

The HashSet is constructed with the EntrySet instance. A new HashSet with an initialization collection will do this:

<pre class="brush: java; title: ; notranslate" title="">public HashSet(Collection&lt;? extends E&gt; c) {
    map = new HashMap&lt;E,Object&gt;(Math.max((int) (c.size()/.75f) + 1, 16));
    addAll(c);
}

</pre>

So, the state from the source collection is added through addAll().

**HashSet.addAll()**

<pre class="brush: java; title: ; notranslate" title="">public boolean addAll(Collection&lt;? extends E&gt; c) {
    boolean modified = false;
    Iterator&lt;? extends E&gt; e = c.iterator();
    while (e.hasNext()) {
        if (add(e.next()))
            modified = true;
        }
        return modified;
    }

</pre>

There is some extra stuff here to manage the modified flag needed for fail-fast iteration but basically, this code walks through the collection&#8217;s iterator calling next(), then add() on the set under construction. Here&#8217;s where things get interesting. We need to look at what the EntrySet uses for an iterator and what happens when you call next().

**EntrySet.iterator()**

<pre class="brush: java; title: ; notranslate" title="">public Iterator&lt;Map.Entry&lt;K,V&gt;&gt; iterator() {
        return new EntryIterator();
    }

</pre>

Here we return yet another inner non-static class EntryIterator. EntryIterator is defined as follows:

<pre class="brush: java; title: ; notranslate" title="">final class EntryIterator extends HashIterator implements Map.Entry&lt;K,V&gt;, Iterator&lt;Entry&lt;K,V&gt;&gt;

</pre>

This should give you some pause. We fully expect to see &#8220;extends HashIterator&#8221; &#8211; there are several key/value/entry iterator flavors buried in here and they share most of their implementation in HashIterator. We also fully expect to see &#8220;implements Iterator<Entry<K,V>>&#8221; since that&#8217;s what we&#8217;re doing. But seeing &#8220;implements Map.Entry<K,V>&#8221; should make you go huh? This iterator walks over Entry<K,V> instances in the CHM yet it itself is typed as an entry? 

What does it do on Iterator.next()?

**EntryIterator.next()**

public Map.Entry<K,V> next() {
          
nextEntry();
          
return this;
      
}

Here we see the iterator advance to the next entry (it walks through segments at the high level and entries at the low level) but in the end it returns **this**? 

The key here is that the EntrySet&#8217;s iterator just stores the current key/value during iteration and returns itself as the entry object for every single actual entry. If you modify that entry, it writes through appropriately to the CHM. Presumably, this saves on the creation and destruction of lots of little Map.Entry instances that will be used just during iteration and typically never looked at again.

Therefore, when the EntrySet iterator is walked through in HashSet.addAll(), AND because of how EntryIterator.hashCode() is implemented:

<pre class="brush: java; title: ; notranslate" title="">public int hashCode() {
        // If not acting as entry, just use default.
        if (lastReturned == null)
            return super.hashCode();

        Object k = getKey();
        Object v = getValue();
        return ((k == null) ? 0 : k.hashCode()) ^
               ((v == null) ? 0 : v.hashCode());
    }

</pre>

Here the hashCode is created as the bitwise OR of the key hash code and the value hash code. So, regardless of what the key and value hashcodes are, a ^ b == b ^ a. In our case we used two strings &#8220;hello&#8221; and &#8220;world&#8221; swapped. Thus the second entry instance will be seen as an identical hash code (and instance) and will simply put over the first entry instance in the HashSet.

A subtly different bug will happen in the case where this property (of swapped key/value) doesn&#8217;t occur. In that case, both entries will end up in the HashSet (size == 2 as expected) but they are the _same_ instance and were put into the set with different hash codes. It&#8217;s always a bad deal to put something in a HashSet or HashMap and then modify the object in ways that modify the hashCode(). This breaks assumptions in the HashMap class that can cause lost elements in the collection and other weird behavior.

As I mentioned, this same instance behavior is probably surprising to most people. It&#8217;s doc&#8217;ed here in [Sun bug #6312706](http://bugs.sun.com/view_bug.do?bug_id=6312706); as a general problem in several maps (CHM, IdentityHashMap, and EnumMap). That bug is marked as &#8220;fix understood&#8221; which is kind of surprising because this behavior is already changed and fixed in JDK 1.6. In 1.6, the EntryIterator is implemented as:

<pre class="brush: java; title: ; notranslate" title="">final class EntryIterator
extends HashIterator
implements Iterator&lt;Entry&lt;K,V&gt;&gt;
{
    public Map.Entry&lt;K,V&gt; next() {
        HashEntry&lt;K,V&gt; e = super.nextEntry();
        return new WriteThroughEntry(e.key, e.value);
    }
}

</pre>

Here EntryIterator does not implement Map.Entry and next() returns a new instance of WriteThroughEntry, which causes the HashSet to be much happier on creation.

Interestingly, another smart colleague noted that the existing Terracotta ConcurrentHashMap instrumentation fixes the 1.5 implementation by creating a wrapper object in the iterator as well.